Combined DES/SD model of breast cancer screening for older women, I: Natural-history simulation

Two companion articles develop and exploit a simulation modeling framework to evaluate the effectiveness of breast cancer screening policies for U.S. women who are at least 65 years old. This first article examines the main components in the breast cancer screening-and-treatment process for older women; then it introduces a two-phase simulation approach to defining and modeling those components. Finally this article discusses the first-phase simulation, a natural-history model of the incidence and progression of untreated breast cancer for randomly sampled individuals from the designated population of older U.S. women. The companion article details the second-phase simulation, an integrated screening-and-treatment model that uses information about the genesis of breast cancer in the sampled individuals as generated by the natural-history model to estimate the benefits of different policies for screening the designated population and treating the women afflicted with the disease. Both simulation models are composed of interacting sub-models that represent key aspects of the incidence, progression, screening, treatment, survival, and cost of breast cancer in the population of older U.S. women as well as the overall structure of the system for detecting and treating the disease.


Introduction
In 1980, 11.3% of the U.S. population was at least 65 years old. In 2030, it is projected that over 20% of the U.S. population will be at least 65 years old, a dramatic increase (Holmes and Hyman, 2003;Downey et al., 2007;U.S. Census Bureau, 2009). Medical advances have also made it possible for people to live longer; the U.S. Government's life tables reveal that the life expectancy of a 65-year-old woman is 19.5 years, which suggests that women of age 65+ still have considerable life expectancy (Social Security Administration, 2010). With the aging of the U.S. population, effective screening guidelines for older women will become increasingly important (Downey et al., 2007;Badgwell et al., 2008;Resnick and McLeskey, 2008;Mandelblatt et al., 2009). Breast cancer is one of the most common cancers among North American women, with 235 030 new cases of breast cancer and 40 430 deaths expected in 2014 (American Cancer Society, 2014b;Weiss, 2012). The benefits of mammography for middle-aged women are commonly accepted, and much work has been done in evaluating the costs and benefits of screening women in this age group (Mandelblatt et al., 2009;Nelson et al., 2009).
There are few studies on the benefits of mammography in women of age 70+, and none of these studies have been randomized controlled trials (U.S. Preventive Services Task Force, 2009). The U.S. Preventive Services Task Force does not recommend routine mammography screening in women of age 75+. However, most major health organizations recommend that healthy women of age 70+ should continue to get mammograms on a regular basis (American Cancer Society, 2014a). Breast cancer risk increases with age, and mammography does not appear to be less effective for women of age 70+.
There are no well-established screening guidelines for women of age 65+ (Resnick and McLeskey, 2008;Mandelblatt et al., 2009;National Cancer Institute, 2009; 0740-817X C 2015 "IIE" Fig. 1. Two-phase simulation approach to evaluating breast cancer screening policies. Nelson et al., 2009;U.S. Preventive Services Task Force, 2009;World Health Organization, 2009). The current guidelines for women of age 65+ are limited and conflicting. Furthermore, clinical trials for breast cancer screening have generally not included women of age 65+; and clinicians do not anticipate any clinical trials specific to breast cancer screening in the future (Crivellari et al., 2007;Badgwell et al., 2008;U.S. Preventive Services Task Force, 2009).
In this article and its companion article , we develop and exploit a two-phase simulation modeling framework for evaluating the effectiveness of breast cancer screening policies for U.S. women of age 65+. Figure 1 depicts the overall structure of this framework. Phase I is the focus of this initial article, encompassing a natural-history model of the incidence and progression of untreated breast cancer for randomly sampled individuals from the designated population of older U.S. women. The natural-history simulation is a Discrete-Event Simulation (DES) model that contains a population growth sub-model as well as incidence, progression, and survival sub-models.
One key output of the natural-history simulation is a database of older women whose breast cancer histories are known in their entirety; and these histories are critical inputs to Phase II, which encompasses the screening-andtreatment simulation and is the focus of the companion article.
The screening-and-treatment simulation integrates DES and System Dynamics (SD) modeling techniques into a single model, in which both stochastic details and population-level state variables are accounted for and allowed to interact. In addition to the previously mentioned sub-models, the DES/SD integrated screening-andtreatment simulation contains a population-level SD submodel that represents overall access to (and satisfaction with) screening as well as public awareness of the need for screening. The integrated DES/SD simulation modeling environment provides a flexible tool for evaluating the effectiveness of a wide range of population-level screening policies that can be compared directly on individual women who are representative of the designated population.
In the next section we survey the relevant literature on analytical and simulation models of breast cancer screening. The two simulations developed in our research go beyond the previous work in the following key respects: (i) they use a factor-based method to determine the annual risk of breast cancer for each woman individually; (ii) they use an individualized tumor growth equation, in which the tumor growth rate is a function of a woman's age, and the parameters of the equation vary randomly across different women in the system; (iii) they provide a direct linkage between the tumor growth equation and an individualized stochastic process representing the progression of breast cancer through its various stages for each affected woman in the system; (iv) they are calibrated to data from the period 2001-2010, and they project the impacts of both screening and operational policy decisions for the future years 2012-2020; (v) they allow screening policies to be individualized to other important risk factors, not just age; and (vi) they allow the impacts of changes in either screening or operational policies to be evaluated in the same modeling environment.
In a follow-up to the companion articles on the naturalhistory simulation and the screening-and-treatment simulation, we focus on the calibration, validation, and analysis of the two simulations (Tejada et al., 2013). In particular we discuss the following: (i) our method for calibrating the two simulations to historical data during the warm-up period 2001-2010; (ii) a new method for validating a large-scale stochastic multi-response simulation model with respect to numerous disparate sources of noisy real-world data; and (iii) a comprehensive analysis of the results of our two simulations over the time horizon 2012-2020 to predict the relative performance of different breast cancer screening policies for U.S. women of age 65 and above.
The remainder of this article is organized as follows. The relevant literature is surveyed in Section 2. Section 3 provides an overview of the natural-history simulation model and its primary sub-models. Section 4 contains a description of the cancer incidence sub-model, including detailed descriptions not only of the data set used for sampling individuals from the designated population but also of the risk equations used to estimate the annual breast cancer risk for those individuals. Section 5 presents the details of the disease progression sub-model, including our extensions of a Gompertz equation representing tumor size as a function of the elapsed time since the onset of cancer, and the details of a stochastic process representing the cancer stage at diagnosis as a function of tumor size. Sections 6 and 7 contain discussions of the survival and mortality sub-model and the population growth sub-model, respectively. The structure and operation of the natural-history simulation are described in Section 8, with particular attention devoted to important events that take place at system initialization as well as on an annual basis. Section 9 is an analysis of the results generated by the natural-history simulation, including (i) breast cancer incidence, prevalence, and death rates; (ii) age-specific population growth; and (iii) inferred distributions that may be of interest to other researchers. In Section 10 we summarize the model's limitations. The Online Supplement to this article and Tejada (2012) contain a more detailed discussion of the natural-history simulation. The screening-and-treatment simulation is fully discussed in the companion paper  and in Tejada (2012).

Relevant literature
Simulation and analytical models for screening have been developed for several diseases (refer to review articles by Sonnenberg and Beck (1993), Fone et al. (2003), Brailsford et al. (2009), andAlagoz et al. (2010) for examples of these models). Screening models can broadly be divided into two categories: (i) models for communicable diseases such as H1N1; and (ii) models for chronic diseases (Day and Walter, 1984) such as prostate cancer (Denton et al., 2011), cervical cancer (Armstrong, 2010), and diabetes (Waugh et al., 2007). The models in the former category focus on the modeling and control of the spread of disease (Hethcote, 2000). Those in the latter category focus on modeling disease progression with the goal of detecting disease early in order to control disease progression to minimize morbidity and mortality.
Our natural-history and screening-and-treatment simulations fall into the second category of screening for chronic disease. These models are characterized by the disease dynamics, which themselves present modeling challenges. By comparing the behaviors of different cancers, the need for distinct modeling solutions becomes apparent. For exam-ple, prostate cancer is often a slow-growing cancer so an acceptable treatment protocol is "watchful waiting," which makes it possible to observe more of the disease progression. On the other hand, breast cancer is an aggressive disease, for which the standard of care is either to treat surgically or to treat with radiation or chemotherapy upon detection (Ivy, 2009). This makes it difficult to observe the natural history of the disease because the disease is not allowed to progress after detection. Simulation models of the natural history of the disease are critical for characterizing breast cancer progression in order to measure the effectiveness of screening programs; on the other hand, such natural-history simulations are inherently challenging to develop.
Most cervical cancer originates from the Human Papilloma Virus (HPV), which is a communicable disease. Cervical cancer is unique in that it is a hybrid, being both a communicable and chronic disease. One of the focuses of cervical cancer screening models is the progression of the disease from HPV to cervical cancer and the development of HPV (Eddy, 1987;Armstrong, 2010). Some of the unique features of cervical cancer that add complexity are the fact that there are several types of HPV and not all types lead to cervical cancer; and there is a vaccine for several types of HPV. As a result, the focus of cervical cancer screening models is both early detection and preventive vaccination. The disease progression for each chronic disease (even each type of cancer) is unique; and, as a result, each disease presents its own set of modeling and methodological challenges. Therefore, it is not typically possible to take one disease screening model framework and apply it to another disease by simply changing parameters; instead, developing a representative screening model requires the development of methods to capture the critical dynamics of the disease under study.
Much of the previous work on breast cancer screening is focused on formulating analytically tractable optimization models and deriving the structural properties of screening policies that are optimal for those models in the sense of balancing a measure of medical effectiveness (e.g., lives saved or quality-adjusted years of life gained) versus a measure of the cost of screening. Some of the more recent work is focused on simulation models that are designed to avoid the restrictive assumptions often required for analytic tractability of optimization models. We elaborate these points in the following discussion.
To compare alternative breast cancer screening policies with respect to life expectancy gain, Shwartz (1978) estimates the probability of lymph node involvement at detection and the probability of recurrence based on a stochastic model in which incidence and mortality rates are nonstationary, but the tumor growth rate is stationary (i.e., it is randomly sampled from a distribution that does not depend on the woman's individual risk factors such as age). Although this model estimates the (non-stationary) probabilities of false-negative results, it does not allow false-positive results so that each screening test is assumed to have perfect sensitivity. Ozekici and Pliska (1991) develop a delayed Markov process model of the classic problem of inspecting and maintaining a system subject to deterioration and ultimate failure; but their formulation is described in terms of the medical problem of screening a patient for a cancerous tumor, where the patient's death corresponds to system failure. The authors use dynamic programming to derive a screening schedule that minimizes the expected value of total cost, which includes the costs of ordinary screening tests, false-positive tests, false-negative tests, medical treatments, and death. Beyond the initial sojourn time in the healthy state that will generally have a non-Markovian distribution, the patient's health state is assumed to be a stationary Markov process; and the probabilities of false negatives and false positives are assumed to be constants that depend only on the patient's current health state and not on the patient's individual risk factors such as age. The authors acknowledge that many of the costs in their model-especially the costs of false positives and false negatives-are difficult to estimate in practice and are often highly subjective. Mandelblatt et al. (1992) use a decision analysis model to determine whether annual breast cancer screening extends life for women of age 65+ with and without comorbid medical conditions. The authors conclude that annual screening extends life at all ages among all patients studied; but they suggest that for women of age 85+, short-term anxiety and discomfort associated with screening may outweigh the benefits. These findings suggest the need for further research into alterative screening policies for older women. The main limitations of this model are the following: (i) annual screening is the only alternative to no screening for women of age 65+; (ii) sensitivity and specificity for mammography and clinical examination are constants (i.e., they do not vary over time or with age); and (iii) the probability distribution for breast cancer stage at diagnosis is stationary (i.e., it does not depend on any of the patient's individual characteristics such as tumor size). Maillart et al. (2008) develop a partially observable Markov process model of breast cancer that captures the age-based dynamics of both the disease and the accuracy of test results (sensitivity and specificity). Instead of using optimization, the authors apply sample-path enumeration to evaluate numerous screening policies; and they generate an "efficient frontier" of screening policies, as defined by lifetime breast cancer mortality risk and the expected number of mammograms performed. This model takes into account the following factors: (i) age-dependent sensitivity and specificity of mammograms; (ii) age-related comorbidities for older women; and (iii) age-dependent probabilities of death from breast cancer and from other causes. The authors examine the performance of dynamic screening policies, where the screening interval can be age dependent. The result of this study is a trade-off curve with a menu of efficient policies; and an individual patient can select the policy that balances lifetime mortality risk and effort (number of times screening is needed). One of the authors' main conclusions is that screening should begin early in life and should continue until late in life regardless of the screening intervals adopted, which implies that screening older women is beneficial. The primary limitations of this model are the following: (a) disease progression is assumed to be Markovian; (b) mammography is assumed to be the only method for detection of the disease, which makes the analysis conservative; (c) there is no differentiation between regional and distant (advanced) disease because of the lack of data to estimate the associated state transition probabilities; and (d) mortality is the only measure of medical effectiveness used in the analysis.
The authors acknowledge that relaxing the Markovian assumption would likely dictate a DES approach as opposed to their approach based on sample-path enumeration. Michaelson et al. (1999) develop a simulation model of breast cancer progression using rates of tumor growth and spread of the disease based on data from the biological literature. The objective is to determine the optimal screening interval for detection of breast cancer prior to its distant metastatic spread. In this model, numerous tumors are simulated at the cellular level; and to predict when the disease will spread to other organs, the authors compute the probability that an individual cell will break away from the primary tumor and form a distant metastasis. The results of this simulation model suggest that, in comparison with no screening, policies based on biennial, annual, and semiannual screening would reduce the occurrence rate of distant metastatic cancer by 22, 51, and 80%, respectively. The chief limitation of this model is that it does not consider the costs to society and the individual patient of more frequent screening.
Simulation is one of the most widely used techniques for system modeling and analysis in industrial engineering and operations research, and many would consider it the method of choice for modeling health care systems (Brailsford, 2007). An important related breast cancer modeling effort is that of the Cancer Intervention and Surveillance Modeling Network (CISNET, 2006). The following aspects of the CISNET work on breast cancer were particularly relevant to the development of our simulation models: (i) competing risks to breast cancer as a cause of death (Rosenberg, 2006); (ii) tumor growth models for breast cancer (Tan et al., 2006;Plevritis et al., 2007); (iii) the University of Rochester's model of breast cancer detection and survival (Hanin et al., 2006); (iv) stochastic models for predicting breast cancer mortality (Lee and Zelen, 2006;Plevritis et al., 2006); and (v) the Wisconsin Breast Cancer Epidemiology Simulation Model (Fryback et al., 2006).
Focused on the female population of Wisconsin during the period 1975-2000, the simulation model of Fryback et al. (2006) encompasses the natural history of breast cancer, detection and treatment of the disease, and competing cause mortality. However, a limitation of this model is that it estimates the onset proportion-that is, the age-specific biologic onset rate divided by the age-specific incidence rate in the absence of mammographic screening-without taking into account the primary risk factors of each individual woman; and the subsequent breast cancer simulation is based on the estimated onset proportion. The model's detection techniques are mammographic screening and "clinical surfacing," in which the tumor is discovered by other means. A simple function is used to estimate technological improvements over time. The results of the Wisconsin model indicate that breast cancer mortality in Wisconsin during the period 1975-2000 would have been reduced by 38.3% with mammographic screening and adjuvant therapy (namely, tamoxifen or adjuvant chemotherapy) compared with the option of not using those techniques during the specified time period.

Overview of the natural-history simulation model
The natural-history simulation is composed of a number of interacting sub-models-namely, the cancer incidence sub-model, the disease progression sub-model, the survival and mortality sub-model, and the population growth submodel. As elaborated below, the most important input to the natural-history simulation is a database (Breast Cancer Surveillance Consortium (BCSC), 2010) that contains information about breast cancer risk factors for women in the designated population of U.S. women of age 65+. For each individual woman in the natural-history simulation, her risk factors are randomly sampled with replacement from this database when she enters the simulation.
The natural-history simulation establishes a baseline for capturing the benefits of screening and treatment by determining the earliest time that a cancer could be detected for each woman in the simulated population. We assume "perfect visibility" of the disease progression in all women composing the simulated population. We define the ideal of perfect visibility to mean that for each woman, every year her disease status is perfectly observed and verified so that the resulting diagnosis is error-free. For the naturalhistory simulation, we also assume that all cancers remain untreated. The time-step for this model is 1 year-i.e., we simulate all the events for a given year, then we move to the next year and repeat this process until we reach the end of the time horizon. This approach to the operation of the natural-history simulation makes the most effective use of the available literature, because the logistic regression equations of Barlow et al. (2006) predict a woman's probability of receiving a confirmed diagnosis of breast cancer within a year of a screening mammogram, where the prediction is a well-calibrated function of the woman's risk factors; and we use life tables with breast cancer deaths removed (Rosenberg, 2006) to predict the probability of surviving one more year as a function of a woman's current age and birth year. To establish a cancer history for each woman, each year we compute her risk of receiving a confirmed diagnosis of breast cancer within a year of a screening mammogram; and then we simulate the outcome that cancer is or is not detected during that year. As detailed in Section 5.1, for each woman in the simulation this process continues until one of the following occurs: (i) the woman is still alive, either with or without breast cancer, when the simulation stops at the end of the year 2020; (ii) the woman dies from breast cancer before the end of the simulation; or (iii) the woman dies from other causes before the end of the simulation. If the risk equations determine that the woman develops invasive cancer during the simulation, then the cancer is not treated; instead, the cancer is allowed to progress until one of the outcomes (i), (ii), or (iii) occurs. (The screeningand-treatment simulation presented in  considers screening at various intervals and as a function of individual risk factors, sensitivity and specificity of screening exams, post-diagnosis (adjuvant) treatment, costs, and survival after diagnosis.) Figure 2 shows the possible transitions between health states for each individual woman in the natural-history simulation. Notice that each woman in the simulated population initially begins in the no-cancer state of Fig. 2, and this assumption is consistent with our use of the BCSC (2010) data set and the risk equations of Barlow et al. (2006) in the natural-history simulation. We use the years 2001-2011 as a warm-up period to achieve a representative population during the period 2012-2020. Women who are diagnosed with Ductal Carcinoma In Situ (DCIS) are not cancer free; but in the natural-history simulation, we assume that women diagnosed with DCIS do not die from breast cancer based on the evidence provided in the SEER data (SEER, 2012a(SEER, , 2012b. Given this data and the fact that little is known about the progression of DCIS, we make the additional assumption that DCIS will not progress to invasive cancer in the natural-history simulation. The reasons for our approach to handling DCIS are elaborated in Section 5.2. It was necessary to locate a data set containing risk factors similar to those for U.S. women of age 65+. Age, race, Body Mass Index (BMI), breast density, and many other factors are important in determining a woman's risk of developing breast cancer (Barlow et al., 2006). The BCSC (2006,2009,2010) provided to us a "de-identified" data set containing breast cancer risk factors for slightly over 1000 000 women, where "de-identification" ensures that names, dates of birth, and other identifying information have been removed for the protection of the participants. The data set consists of information from seven mammography registries in different locations across the United States, and it is representative of the women in the U.S. population of age 65+. The BCSC was established in 1994 by the National Cancer Institute to assess community mammography outcomes, collect risk factors, and evaluate cancer outcomes (Ballard-Barbash et al., 1997).
Based on the breast cancer risk factors for each woman in the natural-history simulation, we exploit the logistic regression equations of Barlow et al. (2006) to estimate the woman's risk of receiving a confirmed diagnosis of invasive breast cancer or DCIS within the next year, given that she received a mammogram in that year. Except for age, all other risk factors used to predict a woman's probability of developing breast cancer within a year of a screening mammogram are assumed to stay constant throughout her lifetime.
The disease progression sub-model consists of the following: (i) a Gompertz equation representing tumor growth as a function of the elapsed time since the onset of cancer (Norton, 1988); and (ii) a stochastic process representing stage progression that was formulated by Plevritis et al. (2007) and that is used to determine the stage of invasive breast cancer at diagnosis as a function of primary tumor size. Norton validated the Gompertz tumor growth equation against three data sets for untreated tumor growth. We extend Norton's basic formulation in two ways: (a) we represent the random variation in the parameters of the Gompertz tumor growth equation across all the women in the simulation and; (b) we represent the systematic (non-random) dependence of the tumor growth rate on the woman's age as well as the random variation in the tumor growth rate across all women of the same age in the simulation.
The Plevritis stage progression process allows us to estimate the respective probabilities that invasive cancer is in the local, regional, or distant stage at the time of diagnosis. The stage of invasive breast cancer at diagnosis has a dramatic effect on the type of treatment and on the patient's subsequent quality of life.

Cancer incidence sub-model
Cancer incidence is defined as the number of individuals who are newly diagnosed with cancer over a given time period, typically a year. The breast cancer incidence rate is usually expressed as the number of newly diagnosed breast cancer cases per 100 000 women per year (National Cancer Institute, 2007).
For a randomly selected woman from the population targeted by the Barlow study, let the random variable (response) Y denote an indicator for the woman's receiving (Y = 1) or not receiving (Y = 0) a confirmed diagnosis of breast cancer within a year of a screening mammogram; and let X = (X 1 , . . . , X 11 ) denote the random vector of her associated risk factors (predictor variables) as defined in Table 1. Given the selected woman's specific risk-factor values x = (x 1 , . . . , x 11 ), the Barlow logistic regression equations estimate E[Y|X = x] (i.e., the conditional probability of developing breast cancer within a year of a screening mammogram) using a logistic function of an appropriate linear combination of the those risk-factor values, where E[Y|X = x] is the true probability of developing the disease within a year averaged over all women in the designated population for which the condition X = x holds. Barlow et al. (2006) formulated separate logistic regression equations for subpopulations consisting of premenopausal and postmenopausal women because cancer incidence rates suggest that for these two subpopulations, the risk patterns are not the same and the significant risk factors for developing breast cancer are quite different.
In our study, the designated population consists of U.S. women of age 65+, all of whom are assumed to be postmenopausal. The National Institute on Aging (2010) estimates that 51 is the average age for menopause, but it could occur at any age from the mid-40s to the late-50s. For postmenopausal women Barlow et al. (2006) find that all 11 risk factors in Table 1 are statistically significant at the 0.0001 level of significance. If p POST (x) denotes the Barlow estimator of E[Y|X = x] for postmenopausal women, then in terms of the logit (log-odds) function (1) The estimates of the regression coefficients β 0 , β 1 , . . . , β 11 are given in the Online Supplement to this article. To validate their logistic regression equation (1) for postmenopausal women, Barlow et al. (2006) used 75% of the original BCSC data as the training data set, from which they estimated the equation's parameters; and they tested the equation's prediction accuracy in the remaining 25% of the original BCSC data, which therefore served as the validation data set. As detailed in Section A1.3 of the Online Supplement, the Barlow logistic regression equation would be perfectly for all x. The degree to which that risk equation is well-calibrated can be measured by computing the ratio O/E, where: (i) O denotes the number of confirmed diagnoses (i.e., cases) of breast cancer within a year of a screening mammogram in the validation data set; and (ii) E denotes the expected number of cases predicted by the logistic regression equation for the validation data set. From Table 6 of Barlow et al. (2006), we see that in the validation data set O = 2303 and E = 2325.9 based on Equation (1) where z 1−α/2 is the (1 − α/2) quantile of the standard normal distribution; hence, an approximate 95% CI for the expected value of O/E is [0.9501, 1.031] . Table 6 of Barlow et al. (2006) also shows a p-value of 0.23 for the Hosmer-Lemeshow goodness-of-fit test, which compares the observed and expected numbers of cases for selected subsets of the validation data set (Hosmer et al. (2013), Section 5.2.2). Moreover for the overall BCSC data set of postmenopausal women, the Hosmer-Lemeshow goodness-of-fit statistic had a p-value of 0.29. If the Barlow risk equation for postmenopausal women were perfectly calibrated, then these p-values would be uniformly distributed on the unit interval [0, l]; and if the Barlow risk equation were poorly calibrated, then these p-values would have a distribution that is tightly concentrated close to zero. On the basis of the observed p-values, we concluded that the Barlow risk equation was well calibrated. Because the main performance measures generated by the natural-history and screening-and-treatment simulations are averages taken over all simulated patients, in our study good calibration is the most critical property of the Barlow risk equation as it is used in the cancer incidence sub-model.
On the other hand, the logistic regression equations of Barlow et al. (2006) have been criticized on the grounds that these risk equations cannot adequately discriminate women who will receive a confirmed diagnosis of breast cancer within a year of a screening mammogram from those who will not, because the c-statistics for premenopausal and postmenopausal women are 0.631 and 0.624, respectively, where the concordance c can range from 0.5 (no discrimination) to 1.0 (perfect discrimination). If Equation (1) were perfectly calibrated, then we show in the Online Supplement that the largest possible value for c is much less than 1.0; and, in fact, a perfectly calibrated risk model for predicting a postmenopausal woman's risk of receiving a confirmed diagnosis of breast cancer within a year of a screening mammogram has an upper bound of approximately 0.61 for its associated value of c. Although the Barlow risk equation is well calibrated, it is not perfectly calibrated. Thus, we would expect the maximum value of the c-statistic for the Barlow risk equation to be slightly greater than the computed value of 0.61 based on the assumption of perfect calibration. The trade-off between calibration and discrimination of risk prediction models is well known (Diamond, 1992;Gail and Pfeiffer, 2005;Cook, 2007); and we concluded that in the context of the naturalhistory and screening-and-treatment simulations, the good calibration of the Barlow logistic regression was much more important than its concordance c.
The Barlow risk equations enable us to identify women in the simulated population who would be diagnosed with either DCIS or invasive breast cancer. In the "de-identified" BCSC (2010) data set, 19.58% of breast cancer diagnoses were DCIS, and all others were invasive cancers. Therefore, once it is determined that a woman will be diagnosed with breast cancer in the natural-history simulation, with probability 0.1958 she will be diagnosed with DCIS; otherwise, she will be diagnosed with invasive cancer. Because the behavior of DCIS and its progression to invasive cancer are not well understood at this time (Fryback et al., 2006;Plevritis et al., 2007), we have chosen to leave that area for future research in the hope that more knowledge about DCIS will be available in the future. Despite not being able to track the progression of DCIS to invasive cancer, we can estimate the incidence rate for DCIS in the natural-history simulation (and in the screening-andtreatment simulation).

Extended Gompertz equation for tumor growth
In this subsection only, all times are expressed in months rather than years for notational convenience. To describe the size of a single breast cancer tumor at time t, we used a Gompertz equation for N(t), the number of malignant cells in the tumor at time t, that was proposed by Norton (1988). This approach was chosen because there are abundant data suggesting that breast cancer growth in an individual woman can be accurately represented by a general Gompertz equation (Pearlman, 1976;McManus and Welsch, 1980;Rae-Venter and Reid, 1980;Surbone and Norton, 1993). Starting from some initial tumor size N(t 1 ) at time t 1 , the tumor reaches the size N(t 2 ) at some future time t 2 as given by the Gompertz growth equation, Death caused by breast cancer is usually due to the spread of the disease to other organs. However, modeling these biological processes for each woman is extremely complex and would have drastically increased the execution time of the natural-history simulation. Thus, for each woman with breast cancer in the simulated population, we sampled her lethal tumor size of N L cells from an appropriate distribution; then we used N L to estimate her time of death given that breast cancer is the cause of death as detailed in the rest of this section.
In the relevant literature, some authors express tumor size as the number of cells in the tumor (Laird, 1964;Norton, 1988), whereas other authors express tumor size as the diameter of the tumor (Carter et al., 1989;Plevritis et al., 2007). It is important to be able to relate these two measures. Because most tumor cells are roughly 2.0 × 10 −2 mm in diameter (Boon et al., 1982;Pesce and Colacino, 1986;Van der Linden et al., 1986), it is reasonable to assume that the tumor density, D, has a value of D = 2.38732 × 10 5 cells/mm 3 (Michaelson et al., 1999). Therefore at time t, the tumor volume is given by V(t) = N(t) D mm 3 . We assume that each tumor is spherical in shape, so the tumor diameter (mm) in terms of the number of cells is given by d(t) = {[6N(t)]/(π D)} 1/3 mm, and the cell count N(t) is given in terms of the tumor diameter d(t) (in millimeters) by N(t) = Dπ[d(t)] 3 /6 cells. We exploit the relationships between the tumor-size measures N(t), V(t), and d(t) in dealing with estimates of tumor sizes at different phases of growth, and in determining the stage of the cancer at diagnosis.
Our first extension to the Gompertz tumor-growth equation involves adding random variation to the quantitites N L and N(∞) because it is known that they are not constant across all affected individuals (Norton, 1988). We make the basic assumption that N L and N(∞) are normally distributed random variables and that the means of these two random variables are Norton's "bestfit" estimates E[N L ] ≈ 10 12 cells and E[N(∞)] ≈ 3.1 × 10 12 cells. We assume a conservative coefficient of variation of 0.05 for both N L and N(∞), which determines the standard deviations of the corresponding normal distributions.
Our second extension to the Gompertz tumor-growth equation is to represent the systematic (non-random) dependence of the tumor growth rate on the woman's age. Breast cancer is believed to behave less aggressively in older women (Mandelblatt et al., 1992;Peer et al., 1993;Diab et al., 2000;Crivellari et al., 2007;Downey et al., 2007). Following Norton (1988), we assume that the tumor growth parameter b has a lognormal distribution; but to make the tumor growth rate a function of age expressed in whole years, we vary the lognormal mean and standard deviation based on the original distribution of b given by Norton. We do this as follows: we assign the 25th percentile of the original distribution of b as the expected value of the lognormal distribution of b for a 75-year-old woman; we assign the 75th percentile of the original distribution of b as the expected value of the lognormal distribution of b for a 25-year-old woman; and we let the mean of b vary linearly for all ages (expressed in whole years) between 25 and 75. The distribution for women over 75 is the same as for a 75-year-old woman. For women in each age category, the coefficient of variation of b is 0.05, which completely specifies the lognormal distribution of b for that category.
Since the development of modern treatments for breast cancer beginning in the 1950s, virtually no data have been collected on the untreated progression of the disease. Bloom et al. (1962) provide data on the natural history of untreated breast cancer based on clinical and postmortem records of the Middlesex Hospital (London) during the pe- Fig. 3. Survival of women with untreated breast cancer based on the data set of Bloom et al. (1962), the Gompertz tumorgrowth equation of Norton (1988), and our extended Gompertz equation.
riod 1805-1933. Norton (1988 fits a Gompertz equation to this data set. To validate our extensions of Norton's tumorgrowth equation, we developed a Monte Carlo simulation of invasive cancers in 10 000 women with age-dependent growth rates and variability in lethal and limiting size as described above. Figure 3 displays the results of our simulation superimposed on the Bloom data set and on comparable simulation-based results for Norton's tumor-growth equation. It is clear that our extended Gompertz equation predicts longer survival after diagnosis and that Norton's Gompertz equation provides a better fit to the Bloom data set, but our primary goal is to build a tumor growth equation that can validated under current conditions in the United States, not under conditions that prevailed in the United Kingdom during the period 1805-1933. In the United States the life expectancy of a white woman in 1850, 1931, and 2004 was 40.5 years, 62.3 years, and 80.8 years, respectively (Infoplease, 2014). Therefore we concluded that for the years 2001 and beyond, the life expectancy for U.S. women with untreated breast cancer would differ significantly from that of the Bloom data set.
It is difficult to assess the precise impact of this increased life expectancy for U.S. women with untreated breast cancer on the final performance measures for different screening policies. However, as noted in Section 2 of , the untreated breast cancer history of each woman that is generated in the natural-history simulation is then re-created and used in the screening-and-treatment simulation; and to each woman in the simulated population, we apply separately each screening policy selected for comparison so that the associated differences in performance are evaluated for each woman separately and then aggregated over the entire simulated population. Thus, the effect of increased life expectancy of U.S. women with untreated breast cancer will not unduly distort the estimated mean differences in performance between selected screening policies because this effect is consistently reflected in the health trajectory of each woman in the simulated population under each screening policy. This approach enables us to compute more precise point and CI estimators for the mean differences in performance between selected screening policies based on independent replications of the naturalhistory and screening-and-treatment simulations.
The natural-history simulation computes breast cancer risk on an annual basis for each woman in the simulated population. However, at some point most women with breast cancer will experience symptoms of the disease, or they will detect a lump in the breast; and it is possible that this may happen before or after cancer would have been detected by screening. Therefore, for each woman who develops breast cancer in the natural-history simulation, we must generate estimates of the tumor size at both mammographic detection and clinical detection.
If a woman receiving perfect annual observation is diagnosed with breast cancer, then at the time t md of mammographic detection, the tumor size N(t md ) is a function of the tumor growth rate for that woman. If the tumor is growing rapidly, then it will likely be larger at detection than a tumor that is growing slowly. Duffy et al. (2006) provide information on the size of tumors at mammographic detection, and they identify six different categories for tumor diameter: 1-10 mm, 10-15 mm, 15-20 mm, 20-30 mm, 30-50 mm, and 50-75 mm. Thus, we define six different classifications of tumor growth: very slow, slow, slightly slow, slightly fast, fast, and very fast. When a woman's tumor is detected by screening, her tumor growth parameter b is sampled from the appropriate lognormal distribution for an individual of her age. In order to determine which tumors fall in each category, we use selected percentiles of the original distribution for b. At first glance, it might appear that we should use selected percentiles of the age-dependent distribution from which the woman's growth constant b was sampled; however, the tumor growth classifications {very fast, fast, slightly fast, slightly slow, slow, and very slow} apply to the entire population of women with untreated invasive breast cancer instead of a subpopulation of women in a certain age range-and Norton's original lognormal distribution was specifically formulated to describe the population of all women with untreated invasive breast cancer.
If F 0 (u) ≡ Pr {b ≤ u} denotes Norton's original cumulative distribution function (c.d.f.) of the growth parameter b for all non-negative values of u, then, for example, the 10th percentile of Norton's original distribution is F −1 0 (0.10) .
In terms of this notation, we used the 10th, 25th, 50th, 75th, and 90th percentiles of Norton's original distribution to define the six growth classifications for each woman's tumor as follows: Table 2 summarizes this information. We assume that at the time t md of mammographic detection, the tumor diameter d(t md ) is uniformly distributed as specified in Table 2, where the minimum d min (b) and the maximum d max (b) of the uniform distribution depend on the tumor growth class g to which the tumor growth parameter b belongs-i.e., the class interval (bin) (b min (g), b max (g)] in which the value of b lies. Clinical detection generally occurs when the tumor size enters a critical range of values and the woman or her clinician notices the associated symptoms. Therefore, at the time t cd of clinical detection, the tumor size N(t cd ) is subject to random variation primarily because of human involvement in the detection process. Norton (1988) estimates that for clinical detection, the minimum and maximum tumor sizes are 10 9 cells (20 mm) and 5 × 10 9 cells (34 mm), respectively. Using a least-squares approach to estimating F 0 (·) with a lognormal c.d.f. based on the Bloom data set, Norton also obtains the "best-fit" estimate of 4.8 × 10 9 cells (33.7 mm) for the tumor size at clinical detection, and we take this value as the estimated mode of the distribution for N(t cd ). We fitted a generalized beta distribution to these estimated characteristics of N(t cd ), and we assumed that the standard deviation of N(t cd ) is one-sixth of its range (see section 2 of Kuhl et al. (2010)). As explained in Kuhl et al. (2010, equations (7) and (8)), the fitted beta distribution has its first shape parameter α 1 = 4.149 and its second shape parameter α 2 = 1.166. Clearly, if t md < t cd , then the cancer is detected by mammography before clinical symptoms are observed. On the other hand, if t md > t cd , then the cancer is detected because clinical symptoms have been observed by the patient or her clinician. In the natural-history model, each woman is assumed to seek care immediately after she observes clinical symptoms of invasive breast cancer. Moreover, we assume that DCIS is asymptomatic (Weiss, 2012).
Given that breast cancer has been detected in a woman either mammographically or clinically at time t 2 , we work backward using Equation (2) to infer the cancer onset time t on = t 1 based on the tumor sizes N(t 1 ) and N(t 2 ) at cancer onset and detection, respectively. To work backward from time t 2 to time t 1 , we perform the following assignments in the natural-history simulation: N(t 1 ) = N(t on ) = 1 cell (Norton, 1988); b ∼ Lognormal [μ b (age),σ b (age)] as described in the fourth paragraph of this subsection; Table 2; N(t md ) = (Dπ/6)[d(t md )] 3 cells; N(t cd ) ∼ 10 9 + (4 × 10 9 ) × Beta (4.149, 1.166) cells as detailed in the 10th paragraph of this subsection; N(t 2 ) = min{N(t md ), N(t cd )} cells; and finally we take N(∞) ∼ Normal ( μ N(∞) = 3.1 × 10 12 , σ N(∞) = 1.55 × 10 11 ) cells. Using the values of N(t 1 ), N(t 2 ), N(∞), and b, we solve Equation (2) for the time delay t D = t 2 − t 1 required for the cancer to grow from its initial size N(t 1 ) to the size N(t 2 ) at detection:

cells as specified by Equation (3) and
(4) therefore, it follows that the cancer onset time is t on = t 1 = t 2 − t D (months). The tumor size N(t 2 ) at the time t 2 of detection (either mammographic or clinical) is important because it is used to determine the stage of cancer at diagnosis as detailed in the next subsection. If the projected age at death caused by breast cancer is greater than the projected age at death from other causes, then in the natural-history simulation the woman's time of death is determined by the latter projected age and a non-breast cancer cause of death is recorded; otherwise, the woman's time of death is determined by the former projected age, and breast cancer is recorded as the cause of death. We also record an observation for each important variable (age at diagnosis, tumor size at detection, type of detection) for each woman diagnosed with breast cancer, enabling us to infer the distribution of unknown random variables, such as time to reach clinical symptoms.

Stochastic process representing the cancer stage at diagnosis
Invasive breast cancer is a progressive disease, and the stage at diagnosis plays a significant role in determining not only the type of treatment used but also the patient's prospects for survival. As we discussed in Section 3, breast cancer is typically defined in terms of three stages: local, regional, and distant. In the local stage, the cancer is confined within the breast. In the regional stage, the cancer has spread to the lymph nodes. In the distant stage, the cancer has spread to other parts of the body. In situ cancers such as DCIS have extremely high survival rates, and according to SEER data for the period 1988-2008, the survival rate is 100% for women diagnosed with DCIS (relative to women without cancer) in the age ranges 65-74 and 75+ and for all survival intervals ranging from 0 to 10 years (SEER, 2012a(SEER, , 2012b. Women who are diagnosed with local, regional, and distant invasive cancer have 5-year relative survival rates of 98.5, 83.6, and 23.2%, respectively (American Cancer Society, 2014b;SEER, 2012aSEER, , 2012b. The ultimate cause of breast cancer death is the spread of malignant cells to other parts of the body and the resulting destruction of other organs such as the brain and liver (Michaelson et al., 1999). Clearly, in the natural-history simulation (and in the screening-and-treatment simulation), we need a method for determining the stage of breast cancer at diagnosis, which is dependent upon the size of the tumor at diagnosis. Plevritis et al. (2007) use SEER data (National Cancer Institute, 2007) to construct a stochastic process representing the stage progression of breast cancer that enables us to estimate the probability of breast cancer being in the local, regional, and distant stages as a function of tumor diameter. The Plevritis stochastic process fits clinical data reasonably well, and it is easy to incorporate into the natural-history simulation since we have a method for determining the diameter of breast cancer tumors at diagnosis. (In the screening-and-treatment simulation, the stage at diagnosis is used to determine the length of time the woman will survive after treatment.) Recall that at time t (where all times are expressed in years throughout the rest of this article), the variables N(t), d(t), and V(t), respectively denote the number of cells in the tumor, the diameter of the tumor (in millimeters), and the volume of the tumor (in mm 3 ). Given N(t cd ) = n, Plevritis et al. (2007) compute the following: (i) the tumor volume v = n/(2.38732 × 10 5 cells/mm 3 ); (ii) the initial tumor volume v 0 = (π/6) [d(t on )] 3 mm 3 ; (iii) the conditional probability of being in the local stage, ( 5 ) (iv) the conditional probability of being in the regional stage, ( 6) and (v) the conditional probability of being in the regional stage,  Plevritis et al. (2007) provides maximum likelihood estimates of the parameters γ, η, ω, β, and α subject to the condition α = β; and taking these estimates along with γ = 0 yields the conditional probability distribution of the breast cancer stage distribution given the tumor size N(t cd ) = n observed at clinical detection. We assume that Equations (5) to (7) also apply at the time t md of mammographic detection in the case t md < t cd . After these probabilities have been computed for each woman who has a clinical or mammographic detection, we use them to determine the stage of that woman's cancer at diagnosis in the simulation. The Plevritis model allows us to determine the probability that cancer is in the regional stage (i.e., the stage in which only lymph nodes are involved). If the cancer is in the distant stage, then other organs are already involved. Lymph nodes are a medium for the cancer cells to reach other parts of the body, and the more lymph nodes that are positive for cancer the more likely it is that the cancer has or will spread to other parts of the body. By analyzing SEER data, Carter et al. (1989) describe the relationship of tumor size to lymph node status and overall survival. Carter et al. also provide data regarding the distribution of breast cancer cases by size and lymph node status; and this information enables us to estimate the number of positive lymph nodes associated with a given tumor diameter for cancers in the regional stage. We use this information and the size of the tumor at diagnosis to estimate the stage at diagnosis at a more detailed level. Table 3 shows our method for determining the detailed stage at diagnosis as a function of the tumor size (i.e., diameter d) and lymph node involvement.

Survival and mortality sub-model
The "de-identified" data set (BCSC, 2010) that is randomly sampled in the natural-history simulation does not include information about death ages for women without cancer.
To assign a death age for women who do not die of cancer before 2020, we use life tables provided by Rosenberg (2006) in which breast cancer has been removed as a cause of death. This table contains data for every birth year in the period 1900-2000. In the natural-history simulation, each year a woman has a probability of death occurring from causes other than breast cancer; and this probability is assigned according to the aforementioned life tables. For each woman who does not die from breast cancer, we compute her age at death from other causes; and we store this quantity for use in statistical calculations and for use when the same population is re-simulated in the screening-and-treatment simulation. This approach allows the natural-history simulation to differentiate between breast cancer-related deaths and deaths from other causes, making it possible to calculate life-years saved by using different screening policies and the number of cancer deaths averted in any given year.

Population growth sub-model
It is important to account for changes in the size of the sim- to this time series with adjusted R 2 = 0.7806. We started in the year 2000 because we wanted to capture the recently increasing trend in the designated population, and the end of the year 2000 is the beginning of the simulation "warmup" period. We generated annual increases in the size of the simulated population of U.S. women of age 65+ as follows.
After the events of a given year (e.g., the first year) have occurred in the simulation and the current year has been advanced (e.g., to the second year), we know the number of women who died in the previous year, the number of women still alive at the beginning of the current year, and the predicted percentage of growth (8) in the simulated population for the current year. Therefore, we can estimate the expected increase in the size of the simulated population for the current year; and the number of women actually entering the simulated population at the beginning of the current year is sampled from a Poisson distribution whose mean is our estimate of the expected population increase.

Key assumptions
1. All of the women in the initial simulated population are postmenopausal, and none of these women have had a previous diagnosis of breast cancer. 2. After the first year, only women turning 65 enter the simulated population. 3. None of the cases of breast cancer in the simulated population are treated. 4. Each woman in the simulated population who has not yet been diagnosed with breast cancer has a probability of developing breast cancer that depends on her individual risk factors. 5. Each year we evaluate the logistic regression equation (1) of Barlow et al. (2006) for each woman covered by Assumption 4 to compute her conditional probability of developing breast cancer during that year given her specific risk-factor values. 6. Cancer may be detected by clinical presentation of symptoms or by mammography. 7. For each woman in the simulated population receiving a diagnosis of breast cancer, with probability 0.1958 she is diagnosed with DCIS; otherwise, she is diagnosed with invasive cancer. 8. DCIS does not progress to invasive cancer, so each woman diagnosed with DCIS continues in the simulation until she dies from other causes or the simulation ends. 9. For each woman diagnosed with invasive breast cancer, Equations (2) to (4) are used to determine the tumor size at diagnosis and the cancer onset time. 10. For each woman diagnosed with invasive cancer, Equations (5) to (7) are used to determine the stage of breast cancer at diagnosis as a function of the tumor size at diagnosis. 11. For each woman in the simulated population, the life tables of Rosenberg (2006) are used to determine the projected time at which death would occur from other causes. 12. As detailed in Section 7, each year Equation (8) is used to determine the increase in the size of the simulated population.
Assumption 2 warrants some elaboration. We recognize that some women older than 65 may immigrate to the United States from elsewhere, but we assume that this effect is negligible.

Operation of the natural-history simulation
All of the sub-models were created in Arena (Kelton et al., 2010) and incorporated into the natural-history simulation. Each basic experiment consists of 10 independent replications (runs) of the simulation in which each run starts at the beginning of 2001 and stops at the end of 2020. We chose the latter ending time because of the difficulty of predicting changes in the technology of breast cancer screening and treatment beyond that point.
In the terminology of experimental design, the simulated population of older women sampled from the BCSC (2010) data set represents a random block effect; and by using independent replications of the simulated population, we expand the inference space of our two-phase simulation experiments to encompass the entire future population of U.S. women of age 65+. Moreover, for each simulated population used in the second-phase screening-and-treatment simulation, we evaluate all of the alternative screening policies under the same experimental conditions-that is, in the same random block; and this approach enables us to estimate more accurately the true mean differences in performance between alternative screening policies by removing the random block effect (Box et al. (2005), Section 3.3).
The initial simulated population of size 20 582 is randomly sampled from the "de-identified" BCSC (2010) database. There are several reasons for selecting this initial size for the simulated population, which is 0.1% of the designated population of U.S. women of age 65+ as of December 31, 2000(U.S. Census Bureau, 2009). The BCSC (2010) data set only contains information about 250 569 different combinations of risk factors associated with women in the study who were at least 65. In our judgment, the next larger choice for the initial size of the simulated population would be 205 820, which is 1% of the designated population of U.S. women of age 65+ at the simulation starting time. Having over 100 000 entities in a process-interaction simulation can significantly increase execution time, and we found that 10 runs with the initial simulated population size of 20 582 was sufficient to obtain accurate point and CI estimators of relevant performance measures.
At the start of the simulation, all arrays and global variables (including the current year) are assigned their initial values, and the initial simulated population is created. After a woman's initial age is assigned in the simulation, her individual risk-factor attributes are randomly sampled from the BCSC (2010) data set. We subdivided this data set according to age such that each 5-year age group (i.e., 65-69, 70-74, etc.) has its own subset of risk-factor records. Each woman in the simulated population is assigned risk-factor attributes from a record that is randomly sampled without replacement from the subset of all records corresponding to her 5-year age group. (Here we use the term "attribute" to describe each characteristic that is uniquely and specifically assigned to an individual woman in the simulated population.) After all of her risk-factor attributes have been assigned at the beginning of the current year, each woman in the simulation enters the sub-models for cancer incidence and for survival and mortality, where we determine not only her probability of being diagnosed with breast cancer during the current year but also her probability of death from other causes during that year. There are four possible outcomes for each woman in any given year: death from other causes, a new detection of breast cancer, death from breast cancer after detection, or no change in health status. The Barlow risk equation (1) is used to compute the woman's probability of developing breast cancer during the current year.
If a woman is diagnosed with invasive breast cancer in the current year, then we assume that a primary tumor is present and that this tumor grows from onset to its final size (i.e., the size at which it becomes lethal or its size when the affected woman dies from other causes) according to her individualized version of the tumor growth Equations (2) to (4). After assigning the values of N(t cd ), N(t md ), and N L so that the method of detection and the lethal tumor size are determined, we calculate the following breast cancer attributes for the affected woman: (i) the cancer onset age (i.e., the woman's age when the tumor started growing); (ii) the time delay from onset to mammographic detection; (iii) the time delay from onset to clinical detection; and (iv) the time delay from onset to lethal size. Next, we sample the woman's stage of breast cancer at diagnosis from the probability distribution of stages specified by Equations (5) to (7). Then the woman is returned to the survival and mortality sub-model to determine her ultimate cause of death. (If a woman with invasive cancer is still alive at the end of 2020, then she is deemed a survivor of invasive cancer, although we do compute her cause of death and age at death after 2020.) If a woman is not diagnosed with breast cancer in the current year, then life tables are used to determine her probability of death from other causes during that year. If she will die from other causes during the current year, then her date of death is randomly distributed over the days of that year and her death age is recorded, along with the fact that a death from other causes occurred. The woman then departs the simulation, and all of her attributes are recorded upon her departure.
If a woman neither dies nor develops breast cancer during the current year, then she experiences no other events in that year.
After all events of the current year have occurred in the natural-history simulation, the current year is advanced; and, as detailed in Section 7, Equation (8) is used to deter-mine the number of women who will turn 65 and enter the simulated population. These women enter the simulation, are assigned their initial attributes, and then join the rest of the simulated population in the sub-models for cancer incidence and for survival and mortality. The current year is simulated in the same manner as the previous year, and this logic is repeated until the end of the simulation time horizon is reached.
The next section presents the results of the naturalhistory simulation, including graphs of 95% CIs for incidence rates, prevalence rates, and death rates as well as 95% CIs for some relevant population characteristics and important inferred distributions.

Analysis of simulation results
Selected results from the natural-history simulation are presented in this section. The full set of results is available in the Online Supplement. In this section we discuss the following simulation-generated statistics.
1. Cancer incidence: The estimated number of newly diagnosed cases of breast cancer in a given year per 100 000 women in the designated population of U.S. women of age 65+. 2. Cancer prevalence: The estimated total number of existing cases of breast cancer in a given year per 100 000 women in the designated population, regardless of the time of diagnosis. 3. Deaths: The estimated number of cancer deaths, deaths from causes other than breast cancer, and the total number of deaths in the designated population each year. 4. Population size: The estimated size of the designated population each year, inferred by multiplying the corresponding size of the simulated population by 1000.

Breast cancer incidence
We begin with breast cancer incidence rates, which are reported as invasive cancer incidence rates, DCIS incidence rates, and total incidence rates. Here we present the total cancer incidence rates in Fig. 5; the Online Supplement contains graphs of incidence rates for invasive cancer and DCIS presented separately. For comparison and validation purposes, Fig. 5 depicts the estimated total incidence rate, upper and lower limits of a 95% CI estimator for the total incidence rate, and the relevant age-adjusted SEER data. Figure 5 shows that the total cancer incidence rate in the natural-history simulation is considerably higher than age-adjusted SEER data over the period 2001-2008, when SEER data were available and the simulation was in the "warm-up" period. SEER data are representative of the actual level of screening and diagnosis in the population over the period 2001-2008, during which time annual screening was clearly imperfect. Therefore, we would expect that with perfect visibility of the natural history of breast cancer for each individual in the simulated population, the simulationbased total incidence rates would be considerably higher than what we actually see under imperfect screening and adherence in the real population, which is reflected in this graph. It is not possible to estimate how much higher incidence rates should be in the simulation compared with SEER data, but we know that the simulation-based results should be significantly larger, and the natural-history simulation displays such behavior. The simulation-based incidence rates appear to be reasonable under the assumption of perfect visibility.

Breast cancer prevalence
Next we examine breast cancer prevalence percentages (which are easily converted to prevalence rates per 100 000 women of age 65+), with separate percentages for invasive cancer prevalence, DCIS prevalence, and total prevalence. We use prevalence percentages to match the format of SEER reports. Figure 6 depicts simulation-based point and CI estimates of the total prevalence percentages; the Online Supplement contains similar graphs of prevalence  percentages for invasive cancer and DCIS. Table 4 provides SEER prevalence data by age group. Prevalence rates require a "warm-up" period before achieving a "representative" behavior pattern in about the year 2012 (the beginning of the time period used for evaluating the performance of alternative screening-and-treatment policies). The reason for this is that we begin with a simulated population that is cancer-free, so the initial prevalence percentage is zero. However, after several years the prevalence percentage in the simulated population begins to mimic that of the real population; and it is the "warmedup" simulation results that are comparable with SEER data.
The SEER prevalence percentage of 4.62% for ages 65-69 is plotted in Fig. 6 as a horizontal dashed blue line and is the primary value we use for comparison because by the end of our simulated time horizon, the majority of the women in the simulated population are between the ages of 65 and 74. Additionally, the SEER prevalence percentage increases only slightly to 4.83% for ages 70-74. We would expect "representative" prevalence from the naturalhistory simulation to be less than the SEER prevalence for the designated population of U.S. women of age 65+ for one major reason-in the simulation we are not treating any of the cancers that are detected. The SEER data also includes women whose cancers were diagnosed prior to age 65 but who are still alive after turning 65. When cancers are treated, the women with those cancers remain alive much longer than if the cancers were left untreated, leading to a greater buildup (or prevalence) of breast cancer within the designated population compared with the simulated population. Since the natural-history simulation assumes no treatment, women with breast cancer are dying from that disease at a faster rate in the simulated population than in the designated population; and this phenomenon leads to relatively lower prevalence rates in the simulated population. Beyond the warm-up period, the total cancer prevalence rate fluctuates between 4.3 and 4.5% in the simulated population, which is slightly less than the values reported by SEER, as we would expect. Taking all the foregoing factors into consideration, we judged the "warmed-up" simulation results to be remarkably close to the corresponding SEER data.

Other key performance measures
9.3.1. Death rates Death rates per 100 000 women can be divided into breast cancer death rates, death rates from other causes, and total death rates. In the Online Supplement, we plot each of these rates over the simulation's time horizon, including the sample mean, upper and lower limits of a 95% CI for the theoretical death rate, and SEER data for comparison and validation purposes. Breast cancer death rates require a "warm-up" period before achieving a "representative" behavior pattern in about the year 2012 similar to the situation for prevalence. The reason for this is that we begin each run with a cancer-free simulated population, so it takes several years for breast cancer to start having lethal effects on the women in the simulation. Again, after several years the breast cancer death rate in the simulated population begins to mimic that of the designated (real) population; and it is the "warmed-up" simulation results that are useful to compare with SEER age-adjusted data. Since in the simulation we have perfect annual observation of each woman's health state and in addition we are not treating any cancers, we would expect the breast cancer death rate from the natural-history simulation to be significantly higher than the breast cancer death rate actually occurring in the designated population. In the first place, we are detecting a greater number of breast cancers in the simulated population compared with the designated population. In the second place, all breast cancers remain untreated in the natural-history simulation, whereas in the designated (real) population the majority of breast cancers are treated. After the "warm-up" period, the breast cancer death rates in the natural-history simulation are about 2.5 times as large as the SEER breast cancer death rates. We know the breast cancer death rates in the simulated population should be greater than those in the designated population, but it is impossible to know how much greater the breast cancer death rates should be in the simulated population. We judged that the natural-history simulation delivered reasonable estimates of the breast cancer death rates for the period 2012-2020 under the assumptions of perfect annual observation of each woman's health state and no treatments for breast cancer. 9.3.2. Population growth As previously stated, the initial simulated population size in 2001 was chosen to be 0.10% of the size of the designated population of U.S. women who were at least 65 at the end of the year 2000; and the latter size was obtained from U.S. Census data. We chose this value because it allows us to easily compare output from the simulation with actual data from the designated population, by either multiplying the output of the simulation by 1000 or dividing the actual size of the designated population by 1000. Earlier, we asserted that women of age 65+ would soon become the prevalent breast cancer patient cohort, partially because the older female population is expected to grow rapidly in the near future, and additionally because of their increased risk of developing breast cancer. The age distribution of the initial simulated population was derived from 2000 U.S. Census data, and population growth occurs because women turning 65 enter the simulation on a yearly basis. When suitably rescaled, the simulated population size can be compared with the actual size of the designated population for each year in the period 2001-2010. Figure 7 shows the estimated mean rescaled size of the simulated population based on 10 independent replications of the natural-history simulation superimposed on the actual size of the designated population. The graph shows that the rescaled size of the simulated population is tracking the size of the designated population closely, which led us to conclude that our population growth equation was adequate for predicting population growth during the period 2012-2020. This graph also shows that the size of the simulated population grows by about 50% over the 20-year simulation, which is representative of the rapidly growing older population in the United States.

Model limitations
As with all mathematical and computer-based models, effective use of the natural-history simulation requires a thorough understanding of the assumptions on which the model is based and of the resulting limitations on the model's applicability in practice. The key limitations of this model are as follows.
1. We considered the designated population of U.S. women of age 65+; thus, the model should only be used to evaluate screening policies for this population. To extend the model to other age groups would require updating the relevant model parameters. 2. We assumed that all risk factors aside from age, including family history of breast cancer and BMI, remained constant over time. This may not be true of all risk factors for all women in the designated population. 3. We found good evidence that Equation (1) was a wellcalibrated model of a woman's risk of receiving a confirmed diagnosis of breast cancer within a year after a screening exam, so that Equation (1) was suitable for use in the natural-history simulation. On the other hand, the relatively low value c = 0.624 of the concordance statistic may indicate the need in some contexts for a bettercalibrated predictive model based on a more complete set of risk factors or a more complex functional dependence on the relevant risk factors. In the context of diagnostic testing, patients (and their physicians) are primarily interested in whether they have the disease given the test result, which depends on the calibration of the test; and having such information is quite different from knowing the sensitivity and specificity of the test-i.e., the discrimination of the test ("do patients with the disease have higher risk predictions than those who do not?"), which is reflected in the c-statistic (Steyerberg et al., 2010). The goal of the natural-history model is to accurately capture the incidence of the disease within a population (i.e., "do close to q out of every 100 patients with a risk prediction of q% have the disease?"), which requires an accurately calibrated model (Steyerberg et al., 2010). 4. In the natural-history simulation, a woman diagnosed with DCIS does not progress to invasive cancer. More complete information about the behavior of DCIS and its progression to invasive breast cancer is required to eliminate this limitation of the natural-history simulation. 5. In the absence of sufficient information about the relationship between tumor growth characteristics and a woman's age, we assumed that in the extended Gompertz equations (2) to (4) representing tumor growth, the individualized growth parameter b has a lognormal distribution whose mean and standard deviation are linear functions of the woman's age at the time her breast cancer is diagnosed. In contrast to Equation (4), a more accurate method for estimating a woman's cancer onset time might be based on an age-dependent profile of her tumor growth parameter over her lifetime; but such a development must be based on more complete (and more recent) information than is available in Bloom et al. (1962) and Norton (1988).

Summary and conclusions
Our natural-history simulation model of untreated breast cancer in U.S. women of age 65+ is primarily distinguished from other analytic and simulation models of breast cancer in its formulation and use of individualized sub-models representing the incidence and progression of the disease-that is, for each woman in the simulated population, the on-set, growth, and spread of a tumor depend explicitly on the woman's individual risk factors. The resulting database of simulation-generated natural histories of untreated breast cancer drives the screening-and-treatment simulation, which similarly exploits individualized sub-models of screening, treatment, survival, and accumulation of cost for each individual in the simulated population. Such an approach allows for the sharpest possible comparisons of alternative breast cancer screening policies because the associated differences in performance are evaluated for each woman separately and then aggregated over the entire simulated population. The overall structure of our naturalhistory simulation model of untreated breast cancer is designed to accommodate refinements and extensions of its sub-models to address the limitations detailed in Section 9.4. Implementing those improvements is the subject of ongoing work. Surveillance Consortium co-operative agreement (U01CA63740, U01CA86076, U01CA86082, U01CA63736, U01CA70013, U01CA69976, U01CA63731, U01CA70040). The collection of cancer data used in this study was supported in part by several state public health departments and cancer registries throughout the United States. For a full description of these sources, please see http://www.breastscreening.cancer.gov/work/ acknowledgement.html.

Supplemental Material
Supplemental data for this article can be accessed on the publisher's website at www.tandfonline.com/uiie.