Combined DES/SD model of breast cancer screening for older women, II: screening-and-treatment simulation

In the second article of a two-article sequence, the focus is on a simulation model for screening and treatment of breast cancer in U.S. women of age 65+. The first article details a natural-history simulation model of the incidence and progression of untreated breast cancer in a representative simulated population of older U.S. women, which ultimately generates a database of untreated breast cancer histories for individuals in the simulated population. Driven by the resulting database, the screening-and-treatment simulation model is composed of discrete-event simulation (DES) and system dynamics (SD) submodels. For each individual in the simulated population, the DES submodel simulates screening policies and treatment procedures to estimate the resulting survival rates and the costs of screening and treatment. The SD submodel represents the overall structure and operation of the U.S. system for detecting and treating breast cancer. The main results and conclusions are summarized, including a final recommendation for annual screening between ages 65 and 80. A discussion is also presented on how both the natural-history and screening-and-treatment simulations can be used for performance comparisons of proposed screening policies based on overall cost-effectiveness, the numbers of life-years and quality-adjusted life-years saved, and the main components of the total cost incurred by each policy.


Introduction
In a sequence of two articles, we develop and exploit a two-phase simulation modeling framework for evaluating the effectiveness of policies for screening and treatment of breast cancer in the growing population of U.S. women who are at least 65 years of age. Figure 1 depicts the overall structure of our two-phase simulation framework. Phase I is the focus of the initial article (Tejada et al., 2013b), encompassing a natural-history model of the incidence and progression of untreated breast cancer for randomly sampled individuals from the designated population of older U.S. women. The natural-history simulation is a Discrete-Event Simulation (DES) model that contains a population growth submodel as well as incidence, progression, and survival submodels. The primary output of the natural-history simulation is a database of older women whose untreated breast cancer histories are known, and these histories are critical inputs to the Phase II screening-and-treatment simulation that is the focus of this article.
The screening-and-treatment simulation integrates DES and System Dynamics (SD) modeling techniques to represent the following simultaneously: (i) the screening and treatment activities and the resulting progression of health states and incurred costs for each individual in the simulated population; and (ii) the population-level state variables (stocks) and their associated rates of change (flows) that govern the overall operation of the U.S. system for detecting and treating breast cancer.
The screening-and-treatment simulation model is composed of interacting submodels that respectively represent 0740-817X C 2014 "IIE" screening, treatment, survival and mortality, costing, and population growth. For each woman in the simulated population, the DES submodel represents her associated screening events, diagnostic procedures, and treatment results; however, the details of her behavior are subject to the influence of the population-level SD submodel, which focuses on pervasive factors (state variables) that affect her adherence to screening. Like the natural-history simulation model, the time step for the screening-and-treatment simulation model is one year; and each run spans the period 2001-2020. For each screening policy to be evaluated, the screening-and-treatment simulation calculates key performance measures from the record of detailed activities for each woman in the simulated population. The remainder of this article on the screening-andtreatment simulation model is organized as follows. Section 2 covers the DES submodel, which includes the following: (i) alternative screening policies; (ii) imperfect screening and diagnostic mechanisms; (iii) treatment of some patients with breast cancer and their subsequent survival; and (iv) the costs of screening exams, diagnostic exams, workup exams, and treatment. Section 3 details the operation of the overall simulation model, focusing on the components of the SD submodel and how they interact with the DES submodel. Section 4 summarizes the results of our simulation-based performance evaluation of alternative screening policies over the period 2012-2020. In Section 5 we summarize our main conclusions, including the final recommendation for annual screening between ages 65 and 80; and we end Section 5 by recapitulating the main contributions of this work to both breast cancer screening and simulation methodology. The Online Supplement to this article and Tejada (2012) contain complete details on the screening-and-treatment simulation model and the results of our experimental performance comparison of selected breast cancer screening policies.

DES submodel: screening, diagnostic procedures, survival, and costing
When the screening-and-treatment simulation model is invoked, a user interface is displayed that enables the user to select values for the primary design variables and runcontrol parameters using option buttons, check boxes, and drop-down combo boxes as detailed in the Online Supplement and on pp. 184-196 of Tejada (2012). Following the user's specification of the screening policy to be evaluated, women enter the screening-and-treatment simulation exactly as they entered the natural-history simulation. As discussed in Tejada et al. (2013b), individual attributes and cancer histories associated with women who entered the natural-history simulation are stored in a database in the order that those individuals entered the natural-history simulation. In the screening-and-treatment simulation, those individuals are then retrieved from the database in the same order and are reassigned their corresponding attributes and cancer histories so that they enter the screening-andtreatment simulation at the same points in simulated time that they entered the natural-history simulation. As discussed in Section 4 of this article (and in Chapter 4 of Tejada (2012)), we perform 10 runs of the screening-and-treatment simulation for each screening policy to be evaluated; and we use the method of common random numbers (Kelton et al., 2010) to sharpen the comparisons between different screening policies. Thus, the same 10 randomly sampled populations used in the natural-history simulation are re-created in the screening-and-treatment simulation; and to each individual in each simulated population, we apply separately each screening policy selected for comparison. This approach enables us to compute more precise point and confidence interval estimators for the mean differences in performance between selected screening policies. After her attributes and breast cancer history are initialized at the time she joins the simulated population in the screening-and-treatment model, each woman enters the screening submodel that represents all activities related to detection of breast cancer. The screening submodel implements the selected screening policy, samples the successive probabilities of adherence to each screening appointment for each individual, and determines the type of screening, diagnostic, and workup exams to perform on that individual as required. Whereas false-positive results and falsenegative results can occur for screening exams, diagnostic exams can have false positives but not false negatives, and workup exams such as biopsies are assumed to be perfect so that they yield only true positives and true negatives. If breast cancer is present in an individual, then as in the natural-history simulation (Tejada et al., 2013b, Section 5.2), the stage of breast cancer at diagnosis is determined according to the stochastic process formulated by Plevritis et al. (2007) to represent the progression of the disease.
The treatment submodel is relatively simple, as the details of treatment are not currently the focus of this article. Only women with a detected breast cancer enter the treatment submodel. Through consultation with breast cancer experts, we estimated the probabilities that such women are treated given their age and the presence of other comorbid diseases. If a woman diagnosed with breast cancer does not receive treatment, then in the screening-and-treatment simulation, her age at death and cause of death are identical to the corresponding outcomes in the natural-history simulation.
The survival submodel only processes women who are correctly diagnosed with breast cancer and are selected for treatment in the treatment submodel. For each woman in the survival submodel, we estimate an age at death resulting from breast cancer based on SEER data (National Cancer Institute, 2009) and an age at death resulting from other causes based on breast cancer-adjusted life tables (Rosenberg 2006); and we use the minimum of these two estimates to assign the woman's age at death and cause of death. In addition to computing the number of life-years saved, the survival submodel computes the number of qualityadjusted life-years (QALYs) saved based on utilities from the breast cancer literature.
Within the screening-and-treatment simulation model, the costing submodel keeps track of the costs incurred for screening exams, diagnostic exams, workup exams, and treatment of breast cancer. The sum of these costs is used to compute the cost-effectiveness of each alternative screening policy.

Screening
In each year, if a woman's screening policy recommends having a mammogram and she adheres to the policy, then she enters the screening submodel. Given that each woman's complete cancer history is known, cancer presence and tumor size are known at all times and therefore for all screening events. If the woman being screened does not have cancer, then ultimately the testing will reveal this, because false positives are possible for screening exams and diagnostic exams but not for workup exams. If the woman being screened does have cancer, then the tumor size at screening is used to determine the probability of detection by screening. If the tumor is detected by screening, then subsequent diagnostic imaging and biopsy exams will confirm the positive diagnosis. If the cancer is not detected by screening, then no diagnostic exams are performed, a falsenegative diagnosis is recorded, and the woman proceeds to the next screening appointment according to her screening policy. In the screening-and-treatment simulation, a diagnostic exam can only be initiated based on the result of a screening exam.
2.1.1. Screening policies A typical breast cancer screening policy consists of a screening interval (i.e., the time between successive screening tests) and the ages at which screening starts and stops. Since this work focuses only on women of age 65+, and the starting age for screening is about 40 for almost all U.S. women, in the screening-and-treatment simulation we assume that each woman's screening policy goes into effect at age 65 and remains in effect up to (but not including) the stopping age. A woman is eligible for screening in a given year if the following conditions are both satisfied: (i) her age is evenly divisible by the screening interval; and (ii) her age is at least 65 but less than the stopping age. For example, if the screening interval is 3 years and the stopping age is 80, then a woman is scheduled for screening at ages 66, 69, 72, 75, and 78. Similarly, if the screening interval is 5 years and the stopping age is 85, then a woman is scheduled for screening at ages 65, 70, 75, and 80.
In consultation with our breast cancer experts, we selected the following three screening policies for the period 2012-2020: interval screening, risk-based screening, and factor-based screening. Interval screening assigns the same starting and stopping ages and the same screening interval to every woman in the population. Risk-based screening assigns one of two selected screening intervals to each woman based on her level of risk for developing breast cancer (i.e., low or high risk). Factor-based screening assigns different screening intervals on an individual basis depending on the presence of a particular combination of breast cancer risk factors. For the "model calibration" period 2001-2011, the screening policy is assumed to be the interval screening policy that yields results most closely matching SEER data from that time period (National Cancer Institute, 2009).
For the period 2012-2020, the screening policy changes to a user-specified policy.
For the purposes of this work, it is necessary to consider past screening and future screening separately. In the past, the stopping age for screening was not well defined so women stopped screening at different ages. To account for the past variation in the stopping age for screening, we selected a set of beta distributions with common minimum and maximum values of 65 years and 100 years, respectively, and with modes ranging from 70 years to 100 years in increments of 5 years. This setup resulted in seven different distributions for stopping ages of individual women to be used in conjunction with past screening intervals during the period 2001-2011. In the future period 2012-2020, a stopping age is assigned to each woman in the simulated population based on the screening policy being evaluated; therefore, each woman has no additional screening tests at or after her assigned stopping age. During the period 2012-2020, the stopping age can take on any value from 70 years to 100 years in increments of 5 years, but the stopping age is independent of the individual's risk factors. Future work could include allowing the stopping age to be based on each individual's risk factors.
Interval screening: To each woman in the population, interval screening assigns the same starting and stopping ages and the same screening interval. We consider screening intervals of 1, 2, 3, 4, and 5 years. When combined with seven possibilities for stopping age, whether those possibilities are specified by beta distributions (for the period 2001-2011) or constants (for the period 2012-2020), this scheme yields 35 possible interval screening policies.
Risk-based screening: Risk-based screening was suggested by our breast cancer experts. To estimate each woman's overall risk of being diagnosed with breast cancer within 1 year of a screening exam, we used the logistic regression equations of Barlow et al. (2006). The woman's age is the only dynamic risk factor in the Barlow equations. All other risk factors remain constant. In the natural-history simulation, each woman receives a perfect annual screening exam but no treatment for any detected cancer so that her 1-year risk can be evaluated annually according to the Barlow equations and recorded; then over the course of the natural-history simulation, relevant percentiles of this 1year risk are estimated for the entire simulated population, as summarized in Table 1. In the screening-and-treatment simulation, the definition of high risk is permitted to vary, corresponding to the top 20%, 15%, 10%, and 5% of the distribution of the Barlow 1-year risk for women of age 65+ as respectively specified by the first four percentiles given in Table 1.
From the five screening intervals under consideration, all possible pairs are assigned to high-and low-risk women subject to the constraint that the screening interval for highrisk women must not exceed the screening interval for lowrisk women. Therefore, we have 15 feasible assignments of screening intervals to low-and high-risk women; and when these assignments are combined with four possible definitions of high-risk and seven potential stopping ages, we obtain 420 risk-based screening policies.
Factor-based screening: Factor-based screening is the most versatile and complex type of screening policy that the screening-and-treatment simulation can handle. As shown in Table 2, the Barlow risk equations include 11 risk factors (predictor variables). First, the user selects risk factors on which to base the screening interval, ranging from only one factor to all 11 factors. Next, a screening interval is specified for each level of each selected risk factor. The actual screening interval for an individual woman is then assigned as follows: where in general Freq{SFactj} denotes the screening interval corresponding to level j of risk factor SFact as described in Table 2. Considering all possible combinations of the 11 risk factors, together with the seven chosen stopping ages, we have 16 329 600 possible factor-based screening policies. Further allowing for the possibility of using any subset of the 11 risk factors, we see that the number of possible factor-based screening policies is so large as to preclude any attempt to optimize system performance by total enumeration. Thus, the development of an intelligent search method is needed to reduce our focus to the most cost-effective policy or group of policies. As detailed on pp. 214-224 of Tejada (2012), the search method exploits OptQuest for Arena (Kelton et al., 2010).

Screening procedures
Mammographic exams: Screening exams use either film or digital mammograms. These two types of screening tests are assumed to be identical with respect to their values of sensitivity (i.e., the probability of a positive screening test given that the woman has cancer) and specificity (i.e., the probability of a negative screening test given that the woman does not have cancer). This assumption is mainly due to the lack of data on the differences in sensitivity and specificity of the two screening tests; however, both sensitivity and specificity are affected by changes over time in the level of screening technology as described in Section 3. Therefore in the screening-and-treatment simulation, cost is the only difference between digital and film mammograms. Table 3 lists the total number of mammograms and the number of digital mammograms performed annually during the period 2001-2009 as reported by the Breast We concluded that the fourth-degree polynomial yielded the best fit to the available data based on these results as well as visual inspection of three graphs of the data on which each of the three fitted polynomials are superimposed separately (Figs. 2, 3, and 4 in the Online Supplement). The probability of a false-positive screening exam (i.e., 1 -specificity) is estimated from BCSC data and is dependent on each woman's age and the time since her last mammogram. These values are given in Table 4. A positive screening exam, either a true positive or a false positive, results in a follow-up diagnostic exam. Data on the probability of a true-positive screening exam (i.e., sensitivity), which depends on the tumor size at the time of screening, were provided by the CISNET breast cancer modeling group (Fryback et al., 2006) and are listed in Table 5. If a woman with breast cancer does not have a true-positive screening exam, then she has a false-negative exam. Figure 1 in the Online Supplement depicts the probability of tumor detection as a function of tumor size. If the screening exam is negative (either a true negative or a false negative), then the woman advances to the next year in the simulation as usual. For screening exams, both false positives and false negatives are affected by the level of breast cancer screening technology at the time of the exam. A detailed analysis of this effect is given in Section 3. Diagnostic exams: Given only to women who test positive for breast cancer by a screening exam, diagnostic exams are assumed to have perfect sensitivity-i.e., each woman's diagnostic exam has probability zero of yielding a falsenegative result. Each woman who tests positive for breast cancer by a diagnostic exam will also receive a workup exam, at which point the presence or absence of breast cancer will be confirmed. Diagnostic exams are not assumed to have perfect specificity-i.e., a woman's diagnostic exam has a nonzero probability of yielding a false-positive result that leads to a benign workup exam. The probability of a false-positive diagnostic exam is based on BCSC data and depends on age and time since the last mammogram. These values are given in Table 6. For costing purposes, three types of diagnostic exams are considered: mammography, ultrasound, and Magnetic Resonance Imaging (MRI). For a given patient, any combination of these diagnostic exams could be performed. For the period 2001-2007, Table 7 summarizes the percentages of patients receiving each combination of diagnostic exams based on data provided by the Carolina Mammography Registry (2013). The data from 2007 were used for the years 2010-2020 in the screening-and-treatment simulation.
Workup procedures. Encompassing open (surgical) biopsies, Core Needle Biopsies (CNBs), and (on a very small scale) Fine Needle Aspirations (FNAs), workup exams are perfect-that is, if breast cancer is present, then the workup exam will return a positive result; and if breast cancer is not present, then the workup exam will return a negative result. We assume that only one workup procedure is performed on each woman who receives a positive diagnostic exam; Table 8 shows the percentage of women receiving each type of workup exam. Our breast cancer experts consulted with their radiologists and used their own experience and judgment to provide the subjectively estimated percentages in Table 8.

Treatment submodel
For each woman diagnosed with breast cancer, the treatment submodel determines whether she will receive treatment by randomly sampling a Bernoulli random variable for which the success probability is based on her age and health status at the time of diagnosis. The relevant probabilities are listed in Table 9. Our breast cancer experts consulted with their radiologists and used their own experience and judgment to provide the subjective probability estimates in Table 9. Currently, the screening-and-treatment simulation only takes into account whether treatment occurs, without considering the type of treatment to be administered. In the future, the treatment submodel may be expanded to incorporate variations in treatment and the resulting outcomes. A woman with breast cancer who is not selected to receive treatment will have the same age at death and cause of death as she did in the natural-history simulation. A woman who is selected to receive treatment will proceed to the survival submodel, where her age at death and cause of death will be based on survival distributions.

Survival submodel
Only women selected to receive treatment enter the survival submodel. For each woman who receives treatment, the survival submodel generates an age at death and a cause of death that may differ from the corresponding outcomes in the natural-history simulation, in which treatments are not administered. SEER (National Cancer Institute, 2009) provides estimates of the probability that a woman who receives treatment will live for a specified number of years, Because the SEER data set fails to account for death from other causes, the survival submodel also uses annual probabilities for death from other causes based on the breast cancer-adjusted life tables of Rosenberg (2006). Therefore, in the survival submodel, each woman who receives treatment is assigned a death age from breast cancer and a death age from other causes; then the minimum of these two ages is used to assign her age at death and the cause of death.
In the screening-and-treatment simulation, a woman receiving treatment often lives longer than her untreated counterpart in the natural-history simulation, especially if her cancer is diagnosed in the early stages. Therefore, for each treated woman in the screening-and-treatment simulation, we define her life-years saved as the number of additional years she lived in the screening-and-treatment simulation compared with the number of years she lived in the natural-history simulation. This is an important measure of effectiveness for screening policies, because it is  essentially the benefit in a cost-effectiveness analysis based on the cost-benefit ratio. One of the most important features of the screening-and-treatment simulation is its ability to estimate the "true" years of life saved for each treated woman in the simulated population without the bias caused by lead-time or length-biased sampling. Equally important is the ability to estimate the "true" QALYs saved for each treated woman in the simulated population without these sources of bias. Accumulating QALYs saved for each woman in the screening-and-treatment simulation requires utilities that represent the quality of a woman's life at different ages as well as the stage of breast cancer at the time of diagnosis. A woman diagnosed with local cancer will have a higher quality of life than a woman diagnosed with distant cancer, because local cancer is more easily treated and attacks a smaller portion of the body. Similarly, a younger woman will have a higher quality of life than an older woman, because a younger woman has a greater life expectancy and is generally healthier. The cost-effectiveness of breast cancer screening for middle-aged women has been addressed by many papers in the literature, and Tosteson et al. (2008) provide utilities used to compute QALYs saved for breast cancer screening in women of all ages. Table 10 gives these utilities as a function of age and cancer stage at the time of diagnosis. For each year that a woman lives beyond her counterpart in the natural-history simulation, we record not only a life-year saved but also the corresponding number of QALYs saved.

Costing submodel
Since one of our ultimate goals is to determine the most cost-effective screening policy (or policies) from a societal Note: μ is the overall mean cost of each procedure, min is the minimum value.
perspective for women of age 65+, we use data for Medicare reimbursement rates to determine the costs of exams and treatment. In addition, to separately monitor the costs of screening exams, diagnostic exams, workup exams, and treatment, we monitor the total cost incurred during the periods 2001-2011 and 2012-2020 together with the costs incurred by mammographic and clinical detection of breast cancer. To perform a cost-effectiveness analysis, we compute the present value (i.e., the value in 2012 dollars) of the total cost for each year during the period 2001-2020 using an interest rate of 5%. Our costing data are derived from two sources. The primary source is the Medicare reimbursement database (U.S. Department of Health and Human Services, 2012), which gives the costs of many procedures according to geographic location. We used the Stat::Fit software (Geer Mountain Software Corporation, 2012) to fit an appropriate probability distribution to each of the available data sets. The details of the distribution-fitting process, including goodness-offit statistics, are presented in the Online Supplement and in Appendix D of Tejada (2012). For procedures with no available data, we used the costs listed in Tosteson et al. (2008). To represent the costs of screening, diagnostic, and additional workup procedures, Table 11 summarizes the following: (i) the distributions fitted to the data sets from the Medicare reimbursement database; and (ii) the constant costs estimated by Tosteson et al. (2008). Because the Medicare reimbursement rate has been fairly constant for these exams over the last 5 years, we did not adjust these costs by an interest factor as we did for treatment costs. For our modeling purposes, it is convenient to define treatment costs by stage of treatment and by the stage of breast cancer at the time of diagnosis. These treatment costs expressed in 2005 dollars are given in Table 12.
Treatment costs and reimbursement rates can be expected to increase as new and improved treatments become available. To account for this increase in the costs of treatment, these costs are multiplied by the ratio of the Medical Consumer Price Index (MCPI) for the year in which treatment is given divided by the MCPI for 2005 (the year these costs were estimated), yielding Data on the MCPI were obtained from the online database of the Bureau of Labor Statistics (U.S. Department of Labor, 2012). Data are available back to 1935, and we fitted a second-degree polynomial model to this data set (adjusted R 2 = 0.9913) to predict the MCPI for the period 2012-2020. With information about both cost and QALYs for both mammographic and clinical detections during the periods 2001-2011 and 2012-2020, we calculate the costeffectiveness ratio "average cost per QALY saved" for each year in the screening-and-treatment simulation and for each method of detection. The most important of the performance measures is average cost per QALY saved for mammographic detections between 2012 and 2020.

Combined DES/SD model of U.S. screening-and-treatment system
The remainder of this article describes the development of the screening-and-treatment simulation model, including a detailed discussion of all the key assumptions and a description of each of the interacting submodels. First, we dis-cuss the DES submodel, with its emphasis on the detailed behavior of each individual woman in the simulated population; then we discuss the population-level SD submodel and how the two submodels exchange information and interact while running simultaneously. Lastly, we present the results of using the screening-and-treatment simulation to identify cost-effective or life-saving screening policies; and we examine those results in detail. Methods for calibration and validation of the screening-and-treatment simulation are developed in a follow-up article (Tejada et al., 2013a).

Structure and operation of the screening-and-treatment simulation
The purpose of the SD submodel is to represent populationlevel elements of the screening process, specifically those factors influencing adherence to a given screening policy. It may have been possible to capture these effects using a pure DES approach, but the increased computational complexity of such an approach would have caused excessive run times and thus would have prevented us from effectively using simulation optimization techniques to identify promising screening policies. Adherence to a screening policy is based on a number of factors, some at the population level, such as the amount of congestion at screening facilities, and others at the individual level, such as the presence of other comorbid diseases in each woman. Figure 3 shows the combined DES/SD causal loop diagram for the screening-and-treatment simulation. The top half of the figure displays characteristics of individual women, which are represented as attributes of the associated entities in the DES submodel; and the bottom half of the figure displays characteristics of the population, which are represented by state variables in the SD submodel.
The SD and DES submodels are related through the following: (i) a logistic regression equation for predicting nonadherence to breast cancer screening as a function of key attributes of each individual woman; (ii) the "primary state variables" that directly affect the key attributes in (i); and (iii) "hybrid state variables" that directly affect the operation of the DES submodel. To model adherence for each woman in the simulated population, Gierisch et al. (2010) formulate a logistic regression equation for predicting the probability that a woman will not attend a scheduled screening appointment as a function of the relevant population-level 716 Tejada et al. characteristics as well as the relevant patient-specific attributes. As explained in Section 3.4, we added an additional factor, screening interval, to the Gierisch logistic regression equation; and based on discussions with our breast cancer experts, we assumed that nonadherence to screening decreases as the screening interval decreases.
After finalizing the logistic regression equation for nonadherence, we formulated an SD submodel representing elements of the screening process at the population level. First, we identified the primary state variables directly affecting the attributes of each individual woman that are predictor variables in the logistic regression equation for nonadherence; and then we identified other intermediate state variables that could potentially affect those attributes. Some state variables are user inputs, and intermediate state variables are functions of some of the user-assigned state variables or other state variables. Hybrid state variables are defined at the population level but are directly linked to individuals. For example, the probability of a false-positive mammogram is defined for the entire population, but it also depends on the attributes of each individual woman as specified in Table 4. Moreover, yearly additive changes in the probability of a false-negative mammogram depend on the intermediate state variable that represents the current level of screening technology as explained in the paragraph following Equation (11). All primary, hybrid, and intermediate state variables are updated on an annual basis in the simulation. These state variables are described in detail in the next section, which focuses on the linkage between the DES and SD submodels.

State variables in the SD submodel
Beyond the attributes of each individual woman that were established in the natural-history simulation model and then transmitted to the screening-and-treatment simulation model, the latter incorporates some new individual attributes into its DES submodel as well as all the state variables required for its SD submodel. The following section explains these attributes and state variables, including their units, lower bounds, and upper bounds.

User-specified state variables
Four of the state variables in the SD submodel are user inputs. The user can adjust these state variables through the user interface and control them to an even greater extent via the logic of the SD submodel. The number of screening facilities, represented by the state variable NumOfFacilities(t), is fixed for each year t in the range 2001-2011, but the user can control the trend in the number of facilities (increasing, slightly increasing, constant, slightly decreasing, or decreasing) over the period 2012-2020. Similarly, the user can control future trends in the aggregate capacity of these screening facilities, represented by the state variable CapacityOfFacilities(t) expressed as the total number of mammograms per year that can be performed by all the facilities. The state variable describing the level of advertising for breast cancer screening, PublicAds(t), can be assigned the values 1, 2, or 3 by the user for each year in the period 2001-2020. The state variable describing the level of breast cancer research, BCResearch(t), can also be assigned the values 1, 2, or 3 by the user for each year in the period 2001-2020. A full discussion of each of these state variables is provided in the Online Supplement and in Tejada (2012 that alter the operation of the DES submodel, or they may influence primary state variables that directly affect adherence. For example, the average distance to a screening facility is a function of the number of screening facilities in the United States. The average level of congestion at screening facilities is a function of the demand for screening, the number of facilities, and the average capacity of those facilities. The average time to get test results, in days, is a function of the intermediate state variables for screening technology and the average level of congestion at screening facilities. Finally, appointment availability is a function of the congestion at screening facilities. Formal definitions of all intermediate state variables can be found in the Online Supplement and in Tejada (2012).
Average distance to a screening facility: Using a facility location optimization routine developed by Bucci et al. (2007), we estimated the average distance to a screening facility, in miles, as a function of the number of facilities in the United States. The optimization routine was run using between 500 and 20 000 facilities in increments of one facility to account for all possibilities. Then, a polynomial was fitted to these data points, enabling rapid estimation of the average distance to a facility as a function of the total number of facilities (adjusted R 2 = 0.989). The intermediate state variable DistanceToFacilities(t) is computed as a function of NumOfFacilities(t) as follows: Level of congestion at screening facilities: The intermediate state variable representing the level of congestion at screening facilities, CongestionAtFacilities(t), is intended to reflect how crowded screening facilities may become with the aging of the U.S. female population. Waiting time is a commonly used measure of congestion. However, the computational complexity of having each individual woman queue up at a screening facility would cause the screening-and-treatment simulation to have extremely long run times, making optimization routines difficult to apply. Thus, we chose to represent the average waiting time on an aggregate population level. In each year t of the simulation, we know the following: the number of screening facilities, NumOfFacilities(t); the average service rate of these facilities, ServiceRatePerFacility(t); and the number of women who received a mammogram in the simulation during the previous year, NumberOfMammograms(t − 1).
To estimate DemandforScreening(t), the total demand for mammography in the entire U.S. female population during year t, we perform the following steps: (i) we multiply by 1000 the number of women in the simulated population who received a mammogram in year t − 1 to estimate the number of women of age 65+ in the U.S. population who received a mammogram in year t − 1; (ii) we use a regression equation to predict the proportion of individuals in the entire U.S. female population who will be of age 65+ during year t; and (iii) we divide result of step (i) by the result of step (ii) to obtain DemandforScreening(t). To perform step (ii), we fitted a linear regression equation to the proportion of individuals in the entire U.S. female population who were of age 65+ during each year t in the period 2004-2010 (adjusted R 2 = 0.909). The resulting predictions are given in Table 13 along with ScaleFactor(t), which is the inverse of the result of step (ii) for year t.
To estimate λ(t), the average hourly arrival rate for each facility during business hours in year t, we assumed that the demand for mammograms is uniformly distributed across all facilities and that each facility operates 8 hours per day for 250 days per year. To initialize the screeningand-treatment simulation, we assumed that for year t = 2000 the demand for screening in the simulated population would have been NumberOfMammograms(t) = 9600. Based on these assumptions, we developed the follow- ing method for approximating CongestionAtFacilities(t) expressed in minutes. The Pollaczek-Khinchine formula (Kelton et al., 2010) is used to compute the long-run average waiting time, which we take as our approximation to CongestionAtFacilities(t) on the assumption that in each year t, every screening facility rapidly achieves nearly steady-state operating conditions. Let the random variable S denote the service time (i.e., the time to perform a mammogram) expressed in hours, and let μ S = E[S]. We assume that S ∼ Uniform(0.9μ S , 1.1μ S ) hours. In some years, it is possible for the arrival rate to exceed the service rate, and in these situations the Pollaczek-Khinchine formula is inapplicable. To adjust for this possibility, we limit the maximum value of congestion to 100 minutes so that we have , 100 minutes. (10) Note that DemandPerFacility(t) is the average hourly demand per facility for mammograms by the entire population of U.S. women (and not merely women of age 65+) in year t, and ServiceRatePerFacility (t) Screening technology is assumed to have an impact on the probabilities of the following: (i) false-negative screening exams; and (ii) false positives for screening and diagnostic exams. The hybrid state variables FalseNegPercent(t) and FalsePosPercent(t) represent probabilities (i) and (ii), respectively; and their corresponding base values are given in Tables 4, 5, and 6. We assume that increases in screening technology will cause FalsePosPercent(t) to increase and FalseNegPercent(t) to decrease, as seen over the last 10 years. In particular, additive yearly adjustments to FalseNegPercent(t) and FalsePosPercent(t) are functions of ScreeningTechnology(t); and each yearly adjustment has a minimum value of 0.0 and a maximum value of 0.05. The state of screening technology is also assumed to affect survival after treatment. As technology improves, survival after treatment will improve. Thus, we introduce an additive yearly adjustment, TreatmentSurvivalAdj(t), that represents the annual increase in a woman's survival time, expressed in years, as a function of the level of screening technology. We define another adjustment factor, StageFactor, that acts as a multiplier for TreatmentSurvivalAdj(t) based on the stage of cancer at diagnosis. There will be less improvement in survival for distant cancer than there will be for localized cancer, so StageFactor and TreatmentSurvivalAdj(t) (14) is based on the following assumptions: (i) in ideal circumstances the minimum time to get results from a screening mammogram is 5 days; (ii) an additional delay of up to 15 days can be caused by congestion at the screening facility, and this delay is proportional to the average level of that congestion; and (iii) another additional delay of up to 15 days can be caused by an inadequate level of screening technology, and this delay is proportional to the complement of the current level of screening technology.
Appointment availability: Appointment availability can affect access to screening for each woman in the population. We assume that appointment availability decreases as the screening facilities become more congested. Appointment availability is a dimensionless intermediate state variable measured on a zero-one scale, where a value of 0 indicates a significant delay in appointment availability, and a value of 1 indicates no significant delay in appointment availability. All values between 0 and 1 represent different levels of appointment availability as a function of the congestion at screening facilities. Appointment availability is one of two factors influencing access to screening, a primary SD state variable. The expression for appointment availability is The rationale for Equation (15) is that the magnitude of an increase in appointment availability is proportional to the magnitude of corresponding decrease in the average level of congestion.

Primary state variables linked to adherence
Primary state variables directly affect the predictor variables used in the logistic regression equation for nonadherence. The three primary state variables are the following: (i) access to screening, denoted by Access(t); (ii) satisfaction with the breast cancer screening process, denoted by ScreenProcSat(t); and (iii) public breast cancer awareness, denoted by PublicAwareness(t). Each of (i), (ii), and (iii) is a dimensionless state variable measured on a zero-one scale, with 0 being the worst possible value and 1 being the best possible value. Access to screening: This primary state variable is assumed to increase as the average distance to facilities decreases and as appointment availability increases so that we have Because NumOfFacilities(t) is in the interval [5000, 15 000], we see from Equation (3) that lower and upper bounds for DistanceToFacilities(t) are 6.395 miles and 51.83 miles, respectively; thus, the term in square brackets on the righthand side of Equation (16) can take any value between 0.0 and 0.5. Therefore, Access(t) takes values in the interval [0, 1], with 0 being the worst possible value and 1 being the best possible value. Satisfaction with the screening process: The primary state variable for screening process satisfaction is assumed to decrease with increases in any of the following intermediate state variables: the average distance to screening facilities, congestion at the screening facilities, or the average time to get screening results. The level of satisfaction with the screening process, ScreenProcSat(t), is defined as Public breast cancer awareness: The primary state variable representing public awareness of breast cancer in the population is assumed to increase with increases in either the level of public advertising for breast cancer or the level of breast cancer research according to the relation Because both PublicAds(t − 1) and BCResearch(t − 1) can only take the values 1, 2, or 3, PublicAwareness(t) takes the values 0, 0.25, 0.5, 0.75, and 1, with 0 being the worst possible value and one being the best possible value.

Individual attributes from the DES submodel
Recall that each individual woman in the simulated population has her own specific set of attributes characterizing her current health state at each point in time; and together with the primary state variables, these attributes govern the future evolution of her health during the course of the simulation.

Attributes indirectly affecting adherence
The following attributes indirectly affect adherence by their impact on the predictor variables in the Gierisch logistic regression equation for predicting nonadherence: (i) age; (ii) Body Mass Index (BMI); (iii) the number of other people in the household; and (iv) the level of satisfaction with the individual screening technician. Formal definitions of these attributes along with detailed explanations of each can be found in the Online Supplement and in Tejada (2012).

Attributes directly affecting adherence
The following attributes of each woman are predictor variables in the logistic regression equation of Gierisch et al. (2010): (i) presence of comorbidities; (ii) satisfaction with the screening process; (iii) number of perceived barriers to screening; (iv) intention to adhere to the screening policy; and (v) number of screening appointments kept out of the last two scheduled appointments. The presence of co-morbidities, an attribute representing the woman's health status, is a function of age and BMI. Satisfaction with the screening process is a function of the woman's satisfaction with her individual screening technician and her satisfaction with the overall screening process. The number of perceived barriers to screening is a function of the woman's household size, health status, and her access to screening. The woman's intention to adhere to the screening policy is a function of her satisfaction with the screening process, her number of perceived barriers, and the level of public awareness of breast cancer in the population. Finally, a highly significant factor in the Gierisch logistic regression equation is the number of times a woman adhered to the given screening policy out of the last two screening opportunities. Formal definitions of all these variables along with detailed explanations of each can be found in the Online Supplement and Tejada (2012).

Gierisch logistic regression equation for nonadherence: link between DES and SD submodels
The logistic regression equation of Gierisch et al. (2010) was estimated using data from Personally Relevant Information about Screening Mammography (PRISM), a health communication intervention study (DeFrank et al., 2009). PRISM was a National Cancer Institute (NCI)-funded intervention trial designed to enhance annual mammography adherence. It was conducted from October 2003 to September 2008 as part of the NIH Health Maintenance Consortium. The sampling frame for PRISM was female North Carolina residents of ages 40 to 75 who were enrolled in the North Carolina State Health Plan for Teachers and State Employees for two or more years before sampling. Gierisch et al. (2010) state that 80% of the women in this study were of age 50+. After an extensive review of the relevant literature, we concluded that this logistic regression equation represents the most appropriate match to our situation. The response variable for this equation is binary, with the values 0 and 1 respectively denoting adherence and nonadherence to the relevant screening appointment. The predictor variables or "covariates" are the categorical variables presented in Table 14. As previously discussed, for each woman in the simulated population these factors are dependent on the relevant state variables as well as her individual attributes. Factors with a p-value less than 0.05 were judged to be statistically significant and were therefore included in the logistic regression equation. There were seven statistically significant risk factors: age, health status, satisfaction with the screening process, number of perceived barriers to screening, intention to adhere to screening, the number times the woman has adhered to the screening policy out of the last two scheduled appointments, and the screening interval. However, because the given values for the age factor were "under 50" and "over 50," this factor was constant for our designated population of women of age 65+ and thus Table 14. Characteristics of individuals directly affecting nonadherence (Gierisch et al., 2010) Predictor(s)

Factor name (description)
Factor coding x 1 Comorbidity (presence of comorbidities) x 1 = 0 for no x 1 = 1 for yes x 2 MammSat (satisfaction with screening process) x 2 = 0 for not (or somewhat) satisfied x 2 = 1 for mostly (or totally) satisfied x 3 , x 4 NumOfBarriers (number of perceived barriers) x 3 = x 4 = 0 for no barriers x 3 = 1, x 4 = 0 for one barrier x 3 = 0, x 4 = 1 for two or more barriers x 5 IntentToAdhere (intention to adhere to screening policy) x 5 = 0 for no x 5 = 1 for yes x 6 , x 7 Last2Adherence (number of times adherent to screening appointment of last 2 opportunities) x 6 = x 7 = 0 for 0 of last two appointments x 6 = 1, x 7 = 0 for one of last two appointments x 6 = 0, x 7 = 1 for two of last two appointments x j , j = 8, . . . , 12 ScreeningInterval (indicator for a screening interval of j − 7 = 1, 2, 3, 4, or 5 years) x j = 0 for no x j = 1 for yes was not relevant for our purposes. The following risk factors were not statistically significant: race, education level, marital status, perceived financial situation, family history of breast cancer, doctor recommendation for mammogram within the last year, attitude about mammography, and perceived risk of getting breast cancer. Let p NAD denote the probability of nonadherence to the current screening appointment given the values of the categorical variables {x 1 , x 2 , . . . , x 12 } for a specific woman. In terms of the logit function, Logit(x) ≡ ln[x/(1 − x)] for x ∈ (0, 1), we exploit the findings of Gierisch et al. (2010) to conclude that Logit( p NAD ) has a linear regression on {x 1 , x 2 , . . . , x 12 }: From Equation (19), we see that . (20) Finally, we obtain the desired estimate of p AD , the probability of adherence, from the relation p AD = 1 − p NAD . The estimates of the regression coefficients {β j : j = 0, 1, . . . , 12} are given in Table 4 of the Online Supplement.

Performance comparison of screening policies
In this section we summarize the results of comprehensive experiments with the screening-and-treatment simulation model to compare the performance of alternative breast cancer screening policies over the period 2012-2020. To make a convincing case that the simulation-generated results for the period 2012-2020 are a valid representation of what can be expected to happen in the near future under each alternative screening policy, we validated the output of the screening-and-treatment simulation for the period 2001-2011 against SEER breast cancer data for the latter time period. Because of space limitations, the results of this validation are presented in the Online Supplement to this article and in Tejada (2012). A more extensive validation scheme for large-scale stochastic multi-response simulation models is the subject of a follow-up article (Tejada et al., 2013a).
With the help of our breast cancer experts, we chose five performance measures as the most important, and we ranked each policy in terms of those criteria (with rank one denoting the best policy). The five most important performance measures are- Preventing deaths from breast cancer is the primary objective of screening, and we argue that the number of lives saved (or its complement M 1 ) is the most important performance measure for a screening policy. Similarly, M 2 (the number of QALYs saved) is important because it measures not only the years of life saved but also the quality of those additional years of life accumulated over the entire population. The cost-effectiveness of a screening policy (e.g., M 4 ) gauges how much we are paying to save one life-year or one QALY. Another critical performance measure is M 3 , the percentage of cancers diagnosed in the distant stage because the associated women are not likely to live long, and their quality of life will be poor. The total cost of false positives M 5 is also important, but it is arguably less important than M 1 , M 2 , or M 4 . Moreover, M 5 fails to measure the full impact of false positives in terms of the following: (i) the emotional distress suffered by individual women who receive a false-positive diagnosis; (ii) the increased difficulty of identifying breast cancers in the future owing to previous unnecessary incisions in a woman's breasts that were based on false-positive tests; and (iii) the adverse effect on a woman's future adherence to a given screening policy because of her past experience of (i) and (ii). Clearly, an area for future work is the formulation of a widely accepted, comprehensive measure of the effect of false positives.
In scenarios involving different overall objectives, we sought to optimize the corresponding performance measure as the basis for identifying the "best" breast cancer screening policy from a set of alternative policies. Therefore, we solved the following three stochastic optimization problems-  (Kelton et al., 2010) was used to identify the five best screening policies for each problem; then paired Student's t-tests were performed to identify the policy or group of policies that can be declared statistically best in terms of the associated performance measure.
Each experiment with the screening-and-treatment simulation consists of 10 independent runs and generates a full set of results, including: (i) the percentage of women treated; (ii) tables with yearly values for each state variable and system performance measure; (iii) tables with yearly average values for individual characteristics of interest; and (iv) tables of breast cancer incidence and death rates for each year in the time horizon. The simulation output also includes graphs of the following: (i) the stage distribution at diagnosis; (ii) the total cost and a breakdown of costs by type of procedure and method of detection; and (iii) total life-years saved and QALYs saved by method of detection.
See the Online Supplement to this article or Tejada (2012) for complete experimental results.
When the objective for the period 2012-2020 is to minimize breast cancer deaths (problem S 2 ) or to maximize QALYs saved (problem S 3 ), among the thousands of policies tested we found that the following five screening policies were best (where all policies have a starting age of 65 years)- • P 1 : Annual screening stops at age 80.
• P 3 : Annual screening for the top 10% in terms of risk, biennial screening for all others, and screening stops for everyone at age 80. • P 4 : Annual screening for the top 5% in terms of risk, biennial screening for all others, and screening stops for everyone at age 80. • P 5 : Biennial screening stops at age 80.
As elaborated in Tejada et al. (2013b), existing breast cancer screening guidelines for women of age 65+ are limited and conflicting. However, two guidelines have received widespread recognition: (i) the recommendation of the American Cancer Society (2013) that as long as a woman of age 40+ is in good health, she should receive annual screening; and (ii) the recommendation of the U.S. Preventive Services Task Force (2009) that a woman should receive biennial screening from age 50 to age 74 (including the latter age), and that the current evidence is insufficient to assess the additional benefits and harms of screening mammography in women of age 75+. Hence, in addition to screening policies P 1 through P 5 , our simulation-based performance evaluation includes the following policies: • ACS: annual screening for all women of age 65+; • USP: biennial screening with a starting age of 65 and a stopping age of 75 (so that screening occurs at ages 66, 68, 70, 72, and 74).
For problem S 2 (minimize breast cancer deaths), Tables 15 to 21 summarize the results (i.e., the sample averages across 10 replications) for the main performance measures accumulated over the simulated population, whose size is 0.1% of the size of the population of U.S. women of age 65+. Thus, the tabulated results must be multiplied by 1000 to yield comparable estimates for the latter population.
Tables 20 and 21 summarize the results for each of the top-five screening policies as well as the current guidelines ACS and USP. These tables include the estimated mean values for each of the five primary performance measures and the associated ranking of each policy with respect to each performance measure. Note that the sum of ranks provides a rough overall measure of the performance of each screening policy when the five main performance measures are taken into account simultaneously. Based on our review of Tables 15 to 21, we concluded that in terms of breast cancer deaths, QALYs saved, and percentage of cancers diagnosed in the distant stage, the best alternative was  policy P 1 (i.e., annual screening with a starting age of 65 and a stopping age of 80). Although policy P 1 ranked third in cost-effectiveness and last in cost of false positives, these measures were judged to be less important; and from a practical standpoint we did not feel the negative impact on lives saved that would result by switching from policy P 1 to policy P 2 (i.e., annual screening starting at age 65 and stopping at age 75) was justified by the increase in cost-effectiveness or by the decrease in the cost of false positives that would result from such a switch. For example, from Table 20 we see that on average there were 14.4 fewer cancer deaths with policy P 1 compared with policy P 2 ; however, because we are only simulating 0.1% of the designated population of U.S. women of age 65+, this translates into an estimate of 14 400 fewer cancer deaths for the designated population over the period 2012-2065. We considered the latter result to be strong evidence that policy P 1 was preferred in comparison with policy P 2 . When comparing policies P 1 through P 5 with the current guidelines ACS and USP, we concluded that although the USP policy was the most cost-effective and had the lowest cost of false positives, it compared unfavorably with policy P 1 in terms of saving lives. The ACS policy was superior to all the other policies in terms of saving life; however, it compared unfavorably with policy P 1 in terms of cost/QALY saved, and the cost of false positives. We concluded that policy P 1 was the most effective compromise between policies ACS and USP.
See the Online Supplement and Tejada (2012) for a complete discussion of the paired-sample Student's t-tests that we used in the context of simulation optimization problem S 2 to perform pairwise comparisons of policies P 1 through P 5 with respect to performance measures M 1 through M 5 using information from each of the 10 simulated populations of U.S. women of age 65+.
Remark 1: In the context of simulation optimization problem S 1 (minimize cost/QALY saved), the most costeffective screening policy was found to be the following-• P 6 : Annual screening for the top 5% in terms of risk, screening every 4 years for all others, and screening stops for everyone at age 70.
Under policy P 6 we estimated that E[M 4 ] ≈ $37 786, which is 21% less than the average cost/QALY saved under policy P 1 . On the other hand, we estimated that under P 1 on average 42 800 fewer deaths would be caused by breast cancer in the designated population during 2012-2065 when P 1 was compared with P 6 . Moreover, under P 6 we estimated that E[M 3 ] ≈ 44%, which we judged to be an unacceptably high percentage of breast cancers diagnosed in the distant stage. After reviewing with our experts the results of this policy and other policies with similar cost-effectiveness, we realized that simply finding the most cost-effective policy was not equivalent to finding the "best" policy to use for women of age 65+.

Limitations of the study
The screening-and-treatment simulation model has the following limitations: 1. The model is only applicable to the designated population of U.S. women of age 65+. 2. All risk factors aside from age, including family history of breast cancer and BMI, were assumed to remain constant over time. This may not be true of all risk factors for all women in the designated population. 3. As detailed in Tejada (2012), we assumed a Markov chain model for the progression over time of an individual woman's attribute for the presence of comorbidities-i.e., her health status in the next year is assumed to depend only on her age and health status at the current time. In fact, her future health may also depend on other key medical and socioeconomic variables; and this dependence may extend to several years in the past. 4. We used expert opinion to determine the probability of a woman being treated for breast cancer based on her age and health status as summarized in Table 9. Moreover, in the screening-and-treatment simulation, we simply make a treat or no-treat decision without attempting to account for the type of treatment to be given or the impact of that treatment on the woman's survival time; instead, we sample her remaining lifetime from the relevant SEER-based distribution of a treated individual's time until death given the cancer stage at diagnosis as depicted in Fig. 2. We used this approach because detailed data on survival by type of treatment were not readily available from the literature or other sources. 5. We simplified the SD submodel by using several dimensionless intermediate state variables that take values in [0, 1]. We also used simple inputs for breast cancer advertising and research (low, medium, and high). What these levels actually mean may be subject to interpretation; however, the goal of the model is to determine the differences between scenarios, so it is valid to consider the relative difference between a low and high level of breast cancer research. The strategy for causing a shift in those input levels may prove to be complicated; and in the absence of detailed information about the mechanism governing changes in these state variables, we chose to use a minimum-information approach to modeling such changes. 6. The precise functional forms of the interdependencies by which some state variables affect other state variables and the attributes of individual women were not known in all cases. Therefore, we made assumptions about these relationships under the guidance of expert opinion when the necessary information was not available.

Conclusions and recommendations
On the basis of extensive experimentation with the screening-and-treatment simulation, we found that from the perspectives of both practical and statistical significance, annual breast cancer screening for all women with a starting age of 65 and a stopping age of 80 is a superior screening policy for older women in terms of achieving an effective compromise between the conflicting objectives of maximizing lives saved and minimizing the cost per QALY saved. Nevertheless, some policy makers may not judge these performance measures to be the most relevant in some contexts; and one of the primary features of the screening-and-treatment simulation is its ability to evaluate alternative screening policies in terms of almost any relevant performance measure. Thus, others can review full sets of results generated by this simulation and make decisions about which screening policy they consider to be the best, and they will have numerical evidence from a statistically validated simulation model to support their perspective. Many researchers and practitioners working at the interface between computer simulation and health care systems engineering have strongly supported the idea of effectively integrating the DES and SD methodologies in large-scale health care simulations; and we believe that our screening-and-treatment simulation can be regarded as a template or guide for how future combined DES/SD simulation models may be designed for other application domains. In addition, the screening-and-treatment simulation provides an approach to modeling a complex disease and the screening and treatment of that disease in a population when several disparate performance measures are of nearly equal importance.
Among the principal directions for future work, special attention should be given to the following options: 1. Developing a more accurate representation of each woman's attribute for the presence of comorbidities as that attribute depends on her age, health status, and other key medical and socioeconomic variables not only at the current time but also in the past. 2. Developing a more accurate representation of the way in which each woman's risk factors for breast cancer such as BMI and family history of breast cancer evolve over time. 3. Developing more accurate representations of the types of treatment for women who are diagnosed with breast cancer and for the survival of those women after treatment as that survival process depends on the type of treatment and the stage of breast cancer at diagnosis. 4. Formulating a definitive measure of the total effect of false-positive screening and diagnostic exams that can be used to compare alternative screening policies.