Factors associated with higher healthcare costs in individuals living with arthritis: evidence from the quantile regression approach.

OBJECTIVE
To examine the factors associated with higher healthcare cost in women with arthritis, using generalized linear models (GLMs) and quantile regression (QR).


METHODS
This is a cross-sectional healthcare cost study of individuals with arthritis that focused on older Australian women. Cost data were drawn from the Medicare Australia datasets.


RESULTS
GLM results show that healthcare cost was significantly associated with various socio-demographic and health factors. Although QR analysis results show the same direction of association between these factors and healthcare cost as in the GLMs, they indicate progressively increased effect sizes at the 50th, 75th, 90th and 95th percentiles.


CONCLUSION
Findings suggest traditional regression models such as GLMs that assume a single rate of change to accurately describe the relationships between explanatory variables and healthcare costs across the entire distribution of cost can produce biased results. QR should be considered in future healthcare cost research.

Objective: To examine the factors associated with higher healthcare cost in women with arthritis, using generalized linear models (GLMs) and quantile regression (QR). Methods: This is a cross-sectional healthcare cost study of individuals with arthritis that focused on older Australian women. Cost data were drawn from the Medicare Australia datasets. Results: GLM results show that healthcare cost was significantly associated with various socio-demographic and health factors. Although QR analysis results show the same direction of association between these factors and healthcare cost as in the GLMs, they indicate progressively increased effect sizes at the 50th, 75th, 90th and 95th percentiles. Conclusion: Findings suggest traditional regression models such as GLMs that assume a single rate of change to accurately describe the relationships between explanatory variables and healthcare costs across the entire distribution of cost can produce biased results. QR should be considered in future healthcare cost research.
KEYWORDS: arthritis . Australia . claims database . economic burden . healthcare costs . patient-reported outcomes Arthritis is a very common disease around the world [1,2]. It is a form of joint disorder that involves the inflammation of one or more joints [1]. The symptoms of arthritis include swelling, stiffness and pain in the affected joints [1]. There are more than 100 different forms of arthritis, and the most common forms of arthritis are osteoarthritis, rheumatoid arthritis and gout [1, 3,4]. The prevalence of arthritis increases with age [5,6]. Arthritis is also a gendered disease where women have higher prevalence [1, 3,6] and more disability [3,5,7]. Resource dedicated to the management of arthritis is substantial. It has been estimated that the direct medical costs of arthritis are billions of dollars in Australia, Canada and the UK and tens of billions of dollars in more populated countries such as the USA [8].
In international cost of arthritis studies, statistical tools such as generalized linear models (GLMs) with logarithmic link function and gamma distribution for cost (also known as gamma regression) have been commonly used to calculate the adjusted cost of arthritis and/ or to examine the associations between potential explanatory factors and the outcome variable (i.e., cost) [9][10][11][12][13][14]. These regression methods, similar to ordinary least square regression, are used to estimate the effect size of the explanatory variable on costs, where the coefficient of an independent variable represents the estimated average rate of change of cost (i.e., the outcome) per unit change in the explanatory variable [15]. Gamma regression is selected to account for the positive skewness and heteroskedasticity usually found in cost data [13,14]. Both positive skewness and heteroskedasticity are common features of health economic data [16]. Positive skewness of healthcare costs means that the population is dominated with patients who have relatively low (less than the mean) costs, but some have markedly higher costs [13]. Heteroskedasticity means non-constant variance in the cost data [17]; it occurs when one subpopulation has more (or less) variability than the other [17], and may imply subpopulation heterogeneity. Gamma distribution has a probability density function that skews to the right and thus approximates the long tail distribution of costs [10]. It also has a non-constant variance function that approximates many health data [16]. However, if there are heterogeneous subgroups in the population (e.g., identifiable by their levels of healthcare utilization), then there may be different factors affecting the healthcare cost and/or at different degrees in the individuals of these subgroups. A number of studies have shown that the effect of a factor on healthcare utilization (and costs) can be different in the lower cost subpopulations compared to the higher cost subpopulations [18][19][20]. It is, therefore, postulated that traditional regression methods (e.g., ordinary least square and GLMs), which produce a single rate (i.e., the average) of change as indicated by the regression coefficient of each explanatory variable, may be incapable of accurately describing the relationships between the explanatory variable and the outcome (i.e., cost) across the entire cost distribution in the population.
A relatively new class of regression models called quantile regression (QR) has emerged and has been employed in analyses of economic outcomes in medical research [21]. QR differs from traditional regression by enabling modeling of any conditional quantile (such as percentiles) of the outcome variable (e.g., healthcare cost) [21]. Moreover, QR does not assume normality or homoskedasticity of the distribution of healthcare cost. Hence, QR may more accurately describe the relationships between the explanatory variables (such as socio-demographic, lifestyle and health factors) and healthcare costs across the entire distribution of cost. The objective of this study was to examine the relationships between potential explanatory variables and healthcare cost in women with arthritis by employing both traditional regression models and this emerging method. The specific research questions were: What are the significant explanatory variables (i.e., including socio-demographic, lifestyle and health factors) of healthcare cost in older women with arthritis, and the estimated effect sizes using GLMs? What are the estimated effects of the same set of variables using QR estimated at the 50th, 75th, 90th and 95th percentiles? How do the results from GLMs and QR compare?

Study design
This is a cross-sectional, population-based healthcare cost study using health survey data and the linked administrative data, where potential explanatory factors of healthcare costs are examined using both GLMs and QR statistical methods.

Study sample & data source
The Australian Longitudinal Study on Women's Health (ALSWH) is a longitudinal survey of over 40,000 women in three cohorts (born in 1921-1926, 1946-1951 and 1973-1978) that began in 1996 [22]. ALSWH is designed to investigate multiple factors that affect the health and well-being of Australian women [23]. Since arthritis is a gendered disease and women are particularly at risk, the ALSWH cohort provided an appropriate sample for this study.
ALSWH data were collected from the participants using selfadministered questionnaires over a 3-year rolling schedule [22]. The information collected included participants' sociodemographic characteristics, health behavior and risk factors, information on general health and functional well-being, symptoms and medical condition status, and healthcare services utilization and satisfaction [23]. The current study focuses on the 1921-1926 birth cohort. Included in the analysis were women who completed Survey 5 (2008), self-reported arthritis, consented to the linkage of healthcare administrative data and had concessional status in the Pharmaceutical Benefits Scheme (PBS; described below) dataset during the enumeration period. In ALSWH surveys, women were asked, 'In the past 3 years, have you been diagnosed or treated for osteoarthritis, rheumatoid arthritis or other arthritis?' Arthritis status was confirmed when individuals answered 'yes' to this question for any form of arthritis.
Medicare Australia datasets, that is, the PBS and the Medicare Benefits Schedule (MBS) datasets, were the source for all healthcare utilization and costs data in this study. The ALSWH dataset was deterministically linked to these administrative datasets using a unique Personal Identification Number held by Medicare Australia [24]. A detailed protocol for data linkage has been described in the published literature [25].
The PBS dataset includes unit record data on claims for government-subsidized prescription medicines [26]. Each claim represents a unit of healthcare utilization. A PBS subsidy claim is made when the cost of a dispensed eligible prescriptive medicine exceeds the patient copayment [27]. The PBS dataset contains information on the type of medicine dispensed, the date the medicine was dispensed, the amount of medicine dispensed, customer's health insurance coverage, the full cost of the medicine, and the government and customer contribution of the full cost. For this study, only data for women who had concessional status during the enumeration period were included. Concessional status individuals (e.g., pensioners and veterans), as opposed to 'general' status individuals, pay a nominal copayment for their prescription medicines [28]. All PBS medicines cost more than the concessional status threshold and, thus, will always attract government subsidies. Consequently, these individuals have the least amount of missing information compared to the general population in the PBS dataset. Restricting the analysis to the subgroup of individuals whose data are most complete is a common way of handling subsidy administrative data [29][30][31][32].
The MBS is a listing of health services subsidized by the Australian Government [33]. MBS benefits are available to all Australian citizens and permanent residents. The MBS dataset includes unit record data on claims for health services listed in the MBS. These data include the type of health service provided (represented by the Medicare item number), information about the service provider (including his/her gender, state and specialty), the date of service, the service fee and the benefits paid (i.e., the MBS reimbursed amount).
The bottom-up costing method was used to enumerate allcause healthcare costs incurred by each woman during the 12-month period under study. Costs are measured from the Australian Government's perspective. The enumeration period was from 1 April 2008 to 31 March 2009. All-cause costs included all healthcare costs for any reasons. Unit costs for prescription medicines were the government's contribution to medicine costs recorded in the PBS database. Patient copayments were not included. Unit costs for health services were the paid benefits recorded in the MBS database. The sum of healthcare cost for each woman per year was calculated by multiplying the number of utilized healthcare units by the corresponding unit costs. All costs were expressed in 2012 Australian dollars (AUD). Conversion from 2008-2009 costs was based on the Consumer Price Index for the health group [34].

Explanatory variables of healthcare cost
The explanatory variables of healthcare cost examined in this study were socio-demographic variables, lifestyle and other risk factors, and measures of health and other indicators for healthcare need. Selection of explanatory variables of healthcare cost in this study was guided by similar recent studies [35][36][37][38][39][40][41]. The socio-demographic variables consisted of area of residence, marital status, education level, difficulty managing on available income and health insurance coverage. Level of education was categorized into: less than secondary school, less than university and university or higher. In ALSWH surveys, participants were also asked: 'How do you manage on the income you have available?' Women were dichotomized into those with at least some difficulty managing on available income and those who answered 'easy' or 'not too bad'. The Department of Veterans' Affairs (DVA) issues repatriation health cards to veterans, their dependents and widows of veterans. DVA card holders are entitled to benefits for accepted war-caused disabilities treatments or any healthcare needs based on eligibility. Women were dichotomized into those with and without DVA coverage, respectively. Women were also categorized into: without private health insurance coverage, with private hospital coverage only, with private ancillary coverage only and both hospital and ancillary coverage accordingly. Private hospital covers private hospital charges and/or services not eligible for Medicare subsidies (e.g., laser eye surgery). Ancillary services coverage covers other charges including dental treatment, chiropractic treatment, physiotherapy and occupational therapy.
Lifestyle risk factors included obesity, alcohol drinking, current smoking status and physical activity level. Women were dichotomized into normal or underweight, and overweight or obese based on the BMI, where a BMI equal to or higher than 25 kg/m 2 was classified as overweight or obese [42]. Any alcohol use or tobacco smoking was classified as drinker or current smoker, respectively. Moreover, based on the validated Active Australia questions [43], women were categorized into: no physical activity (PA), low level of PA, moderate level of PA and high level of PA.
Health measures and other indicators for healthcare need included the presence of joint symptoms, use of complementary and alternative medicine or therapy, the Short Form-36 physical component score (SF-36 PCS) (i.e., a measure of physical health-related quality of life) [44] and the number of comorbid conditions. In the survey, participants were asked: 'Have you had (stiff or painful joints) in the last 12 months?' Participants who answered 'sometimes' or 'often' were classified as having joint symptoms. Women who consulted a physiotherapist, podiatrist/chiropodist, occupational therapist or 'alternative' health practitioner were classified as complementary and alternative medicine users. The number of comorbid conditions was a count of self-reported medical conditions, which included anxiety, asthma, bronchitis, cancer (any type other than skin), depression, diabetes, hypertension, heart diseases, osteoporosis and stroke.

Statistical analysis
Characteristics of the sample were described, where continuous variables were summarized by means and standard deviations, and categorical variables by proportions. Potential explanatory factors for healthcare costs were examined using: GLMs with logarithmic link function and gamma distribution for cost (i.e., gamma regression), and QR. Results obtained from the two strategies were compared. The final multiple GLM regression model was obtained using the stepwise backward elimination method. A p-value <0.05 was adopted for statistical significance. CIs around the mean of the effect size estimates in GLMs were determined using a boot-strapping method with 1000 repetitions. Furthermore, the effect sizes of the same  Factors associated with healthcare cost of arthritis

Research Report
informahealthcare.com explanatory variables as in the final GLM model were estimated using QR for the 50th, 75th, 90th and 95th percentiles. The same variables were selected in order to facilitate direct comparisons of effect estimates between the GLM and the QR models. The aforementioned percentiles were chosen for QR because it was presumed that the rate of change in costs per unit change in an explanatory variable (Dcost/Dx) would be progressively greater at higher percentiles [19]. One thousand bootstrapped samples were also run for each QR analysis. Data files were constructed using SAS version 9.3 (SAS Institute, Cary, NC, USA). Statistical analyses were performed in Stata IC version 11 (StataCorp LP, College Station, TX, USA).

Ethics approval
Ethical approvals were received from the University of Newcastle and the University of Queensland Human Research Ethics Committees.

Results
The flow of ALSWH participants from Survey 5 to the women finally included in the sample is illustrated in FIGURE 1. A total of 1345 women who self-reported doctor-diagnosed arthritis were included in the analysis. Characteristics of the sample are summarized in TABLE 1.
The mean healthcare cost in women with arthritis was (AUD 2012) $3864 per person per year (standard deviation [SD] $4073) and the median was $2780 (interquartile range $3028). The cost components (i.e., mean and median) are illustrated in SUPPLEMENTARY APPENDIX 1 (supplementary material can be found online at www.informahealthcare.com/suppl/ 10.158614737167.2015.1037833_suppl). Final GLM regression model estimated that women with arthritis who lived in an urban area had $975 higher healthcare costs on average per year (p < 0.001) compared to those who lived in a rural or remote area; women with DVA health insurance coverage incurred on average a $3313 increase in the annual healthcare cost (p < 0.001) compared to those without this cover; women with private hospital insurance had $897 more in government subsidized healthcare (p < 0.001) than those without this cover; and for each additional comorbid condition in women with arthritis, the healthcare cost increased on average by $427 (p < 0.001). The GLM regression model also estimated that women with arthritis who were single had $708 lower healthcare cost on average per year (p = 0.033) compared to those who were married or in a de facto relationship; and for each increase in the SF-36 PCS (i.e., better physical health-related quality of life) in women with arthritis, the healthcare cost decreased by $51 (p < 0.001). Details are shown in TABLE 2. QR analysis produced different effect size predictions, though in the same directions as the GLM results. More specifically, the following were observed: at the 50th percentile, the estimated effect sizes for the area of resident, DVA and private hospital insurance coverage, and the SF-36 PCS were statistically significantly different (indicated by the 95% CIs) from those estimated by GLMs; and the estimated absolute effect sizes for the area of residence, marital status, types of health insurance coverage and SF-36 PCS increased across the 50th, 75th, 90th and 95th percentiles, while the effect size for comorbid conditions decreased across the percentiles. QR results are illustrated in TABLE 3.

Discussion
This study assessed the impact of socio-demographic, lifestyle risk and health factors on the healthcare cost in older women with arthritis using both the GLMs and QR statistical methods. Results show that using QR can produce significantly different results compared to the GLMs. The GLM results in this study are similar to the findings of previous studies, where higher healthcare cost in individuals with arthritis were found to be associated with living in an urban area (vs rural area) [9,35], additional health insurance coverage [35,45], increased number of comorbid conditions [46,47] and worse physical functioning [38,48]. However, direct comparisons of the estimated effect sizes (in dollars per unit change in the explanatory variable) between international studies are difficult (or may be inappropriate) because there are differences in ethnicity, epidemiology of arthritis and healthcare system between countries that can affect the results.
When the effect sizes of the same explanatory variables (as in the GLM model) were estimated using QR, it was observed that the effect sizes in the QR models varied with the order of percentiles. For example, QR models predict that for each unit increase in the SF-36 PCS, the healthcare cost decreases by $32, $56, $101 and $115 at the 50th, 75th, 90th and 95th percentile, respectively (TABLE 3). These findings support the notion that: the population consists of heterogeneous subgroups that can be characterized by their level of healthcare utilization (and thus cost), and the explanatory variables of healthcare cost can have different effects on the population subgroups. These results are similar to those obtained in other healthcare utilization studies [19,20]. In one study, it was found that supplementary health insurance had different effects on the number of doctor visits for subpopulations, after controlling for socio-demographic and health variables; where there was an increasing positive effect between the 0.25 and 0.70 quantiles of the healthcare utilization distribution [19]. Another study showed that gastrointestinal conditions, such as gastrointestinal cancer and liver disease, had different effects on the length of hospital stay, with increasing positive effects from the first to the second quartile of healthcare utilization distribution [20].
The findings of this study have two important implications for practice and policy. First, they highlight a limiting aspect of traditional regression methods (e.g., ordinary least square and GLMs) in healthcare cost research, which assume a single rate of change can accurately describe the relationship between an explanatory variable and healthcare cost in the entire population with very different levels of healthcare utilization and costs. As the explanatory variables of healthcare cost can have different effects on population subgroups, traditional methods can produce biased results. Second, this study demonstrates an important advantage of using QR in cost studies; that is, QR produces estimates of the effect of explanatory variable(s) at any conditional quantile of the outcome variable (e.g., healthcare cost). This method can be very helpful to researchers who want to study specific location(s) of the cost distribution. Assessing how factors influencing the healthcare cost of patients with mean (or median) costs may not be the only concern for researchers. For example, there may be special concern about the high-cost patients since these patients may represent individuals with the highest risk for health problems [49], special need individuals and beneficiaries for increased resource allocation [50][51][52], and/or targets for cost-containment measures [50][51][52]. Thus, the potential and advantages of QR in future healthcare cost research should not be ignored.
There are several limitations to this study. First, findings in this study might not be generalizable to men or to groups with different forms of arthritis. Arthritis is a gendered disease where women have higher prevalence [1, 3,6] and more disability [3,5,7]; thus, a study with a particular focus on older women represents a very important step toward the understanding of the explanatory factors of healthcare costs in individuals with arthritis. Effect sizes of the variables affecting healthcare costs might also be different for specific forms of arthritis. However, the majority of women in this older-aged sample are expected to have osteoarthritis [1,53], and results in this study may be more comparable to those focusing on osteoarthritis [9,35] than other lessprevalent forms of arthritis such as rheumatoid arthritis [38,45]. Future research may investigate the difference in effect sizes and factors associated with healthcare cost between specific forms of arthritis using a stratified analysis. Second, this study focused on healthcare costs from the perspective of the Australian Government and included only Medicare Australia data. The PBS and MBS are the two main health programs funded by the Commonwealth (Australian) Government [54]. Hospital costs were not included because the operation of public  [55]. Third, this study did not build independent regression models for different percentiles using QR. In this study, the QR models for the 50th, 75th, 90th and 95th percentiles included only the explanatory variables obtained from the final GLM model. Since different explanatory variables could have different effects on healthcare costs for patients at different locations of the cost distribution, different models could be obtained (i.e., using stepwise backward elimination) for different quantiles. However, one of the specific aims of this study was to compare QR estimates and estimates obtained from traditional regression models, thus choosing the same explanatory variables (in the GLM model) permitted for direct comparison between QR and GLM estimates. In future healthcare utilization and cost studies, quantile specific models should be constructed for more robust results. There are several strengths to this study. First, the use of self-reported arthritis as the case-definition contributes to the internal validity of the study as it is validated [56] and recommended for use with population-based survey data for surveillance purposes [57]. The major strength of this study, however, is that this study used different regression techniques to assess the determinants of healthcare cost. By comparing QR results to GLM results, this study provided empirical evidence about the utility and advantage of QR in healthcare utilization and cost research, and demonstrated that QR can provide a more comprehensive (and more accurate) view of the associations between potential explanatory variables and cost, than those obtained using traditional regression models. The findings from this study contribute to our current understanding on the usefulness of different econometric techniques for cost of arthritis research (e.g., methods most suitable for research that aims to study high-cost arthritis patients).

Conclusion
This study examined the socio-demographic, lifestyle and health factors associated with higher healthcare cost in older women with arthritis using traditional and more advanced econometric methods. The final GLM regression model predicted that higher healthcare cost in older women with arthritis was independently associated with living in an urban area, having DVA health and private hospital coverage, and an increased number of comorbid conditions. Concurrently, lower healthcare cost was found to be associated with being single and higher SF-36 PCS. However, QR predicted statistically significantly different results, where living in an urban area, having DVA health, or private hospital coverage has increasing positive effects on healthcare cost at the 50th, 75th, 90th and 95th percentiles; and a higher SF-36 PCS has increasing negative effects on the cost. These findings indicate that there are heterogeneous population subgroups that can be characterized by their particular level of healthcare cost, and whose costs are affected by different factors at different degrees. They further suggest the following: traditional regression methods that assume a single rate of change to accurately describe the relationship between an explanatory variable and the healthcare cost across the entire cost distribution may produce biased results, and the use of QR should be considered in future healthcare utilization and cost research as it enables estimates of the effect of explanatory variable(s) at any conditional quantile of cost.

Financial & competing interests disclosure
The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
No writing assistance was utilized in the production of this manuscript.

Key issues
. In assessing the factors associated with healthcare costs, generalized linear models with logarithmic link function and gamma distribution for cost have been commonly used.
. Traditional regression methods such as generalized linear models assume a single rate of change that can accurately describe the relationship between an explanatory variable and the healthcare cost across the entire cost distribution.
. Individuals with different levels of healthcare utilization and cost may be heterogeneous groups; there may be different factors affecting the healthcare cost at different degrees in different groups.
. Our results from assessing the factors associated with higher healthcare cost in older Australian women with arthritis using quantile regression show that various significant factors (e.g., socio-demographic and health variables) have different effects on cost at different percentiles.
. These findings indicate that traditional regression methods may not accurately describe the relationships between explanatory variables and cost for the entire population.
. As quantile regression can be used to estimate the effect of explanatory variables at specific conditional quantiles of cost, its use should be considered in future health economic research. Provides an overview of quantile regression in health economic research.