A simulation study on proton computed tomography (CT) stopping power accuracy using dual energy CT scans as benchmark.

ABSTRACT Background. Accurate stopping power estimation is crucial for treatment planning in proton therapy, and the uncertainties in stopping power are currently the largest contributor to the employed dose margins. Dual energy x-ray computed tomography (CT) (clinically available) and proton CT (in development) have both been proposed as methods for obtaining patient stopping power maps. The purpose of this work was to assess the accuracy of proton CT using dual energy CT scans of phantoms to establish reference accuracy levels. Material and methods. A CT calibration phantom and an abdomen cross section phantom containing inserts were scanned with dual energy and single energy CT with a state-of-the-art dual energy CT scanner. Proton CT scans were simulated using Monte Carlo methods. The simulations followed the setup used in current prototype proton CT scanners and included realistic modeling of detectors and the corresponding noise characteristics. Stopping power maps were calculated for all three scans, and compared with the ground truth stopping power from the phantoms. Results. Proton CT gave slightly better stopping power estimates than the dual energy CT method, with root mean square errors of 0.2% and 0.5% (for each phantom) compared to 0.5% and 0.9%. Single energy CT root mean square errors were 2.7% and 1.6%. Maximal errors for proton, dual energy and single energy CT were 0.51%, 1.7% and 7.4%, respectively. Conclusion. Better stopping power estimates could significantly reduce the range errors in proton therapy, but requires a large improvement in current methods which may be achievable with proton CT.

Accurate stopping power determination is required to calculate the ranges of protons and is critical for precise dose calculations and planning in proton therapy. The primary advantage of proton therapy is the fi nite proton range, which leads to signifi cant dose sparing of the normal tissue in the patient [1,2]. The precision given by this fi nite range also makes proton therapy less robust to differences between the treatment planning models and reality [3 -6]. Errors in the estimated stopping power lead to errors in the predicted proton range, which again may lead to failure of the treatment. The clinical approach to handle such errors is to increase the dose margins used, with a larger dose to normal tissue as the result, potentially offsetting the achievable precision of proton therapy. State-of-the-art for estimating stopping power in vivo is the stoichiometric method based on single-energy x-ray computed tomography (CT) scans [7]. This method is, however, limited by the fact that no direct relation between x-ray attenuation coeffi cient and stopping power exists. The result of this is systematic uncertainties of up to 3.4% in biological materials [8] and potentially much higher in other materials, such as plastic and metal inserts. Several methods have been proposed to remedy this, such as measuring PET activity during treatment [1] or monitoring dose induced tissue changes via magnetic resonance imaging (MRI) [9], but these methods are still far from being clinically adaptable.
Proton CT has been widely tested for the purpose of estimating stopping power directly [10 -13] and a prototype head scanner is currently being tested at the Loma Linda University Hospital as well as a smaller prototype scanner in INFN-LNS, Catania, Italy. These are, however, still limited by the fact that current scanning techniques are quite slow and take up valuable treatment beam time. In addition the maximum range of the protons could be a limiting factor for larger patients [13].
Another attractive option is dual energy x-ray CT (DECT), which is now clinically available from several vendors. By obtaining photon attenuation coeffi cients from two different x-ray spectra in each voxel, electron density and atomic number can be calculated. Yang et al. [14] demonstrated that this could be used for stopping power calculations and recently a few groups compared the method with experiments and found an agreement better than 1.0% [15 -17]. In these studies a single phantom was investigated and the majority of materials investigated were from the same phantom vendor.
In this work we investigated the stopping power estimation accuracy of proton CT for two large phantoms ( ∼ 20 -30 cm) from two vendors using Monte Carlo simulations accounting for position and energy resolution of current proton CT prototypes. To establish whether the potential accuracy gains from proton CT are relevant, the phantoms were scanned at a clinically available dual energy CT scanner and stopping power was estimated using dual and single energy methods. The experimental x-ray CT data was used to establish the single and dual energy CT accuracy levels currently achievable. To our knowledge this is the fi rst simulation study evaluating the potential gains from proton CT using x-ray CT of the same phantoms as reference and the fi rst study investigating dual energy CT on more than one phantom.

Phantom measurements
A dual source CT scanner (Siemens SOMATOM Defi nition FLASH, Siemens Medical, Forchheim, Germany) was used for all x-ray scans. For dual energy scans, the low spectrum was 80 kVp and the high spectrum was 140 kVp fi ltered with a Sn fi lter. For single energy scans, a standard 120 kVp spectrum was used. A RMI 467 electron density phantom (Gammex, Middleton, WI, USA), diameter 32 cm, was scanned with both dual and single energy. The phantom consists of 16 cylindrical inserts of different tissue-like materials. A second phantom, the Model 002H5 IMRT Phantom (CIRS, Inc., Norfolk, VA, USA), with elliptical cross section (30 cm wide and 20 cm thick), containing fi ve different tissue-mimicking inserts was also scanned and used for validation. Insert composition and density for both phantoms can be found in tables 1 and 2 of Landry et al. [13]. The images were imported into MATLAB (Math-Works, Inc., Natick, MA, USA) and the average HU values were extracted for each insert using a region of interest covering 50% of the insert, in a central slice of the phantom. All x-ray CT scans were performed with a computed tomography dose index (CTDI) of 30 mGy (for DECT this value was for the sum of the contributions of the two scans).

X-ray CT calibration
For DECT, electron density was calculated using the method of Saito [18], and the effective atomic number using the method of Landry et al. [19]. These values were then used to calculate stopping power, following the method of Yang [14]. The necessary DECT calibrations were carried out using the Gammex phantom data.
For SECT, stopping power was calculated using two methods. The fi rst approach calibrated CT numbers directly from the Gammex phantom data while the second was based on the stoichiometric method [7,20] where use was made of reference human tissue data [21].

Proton CT simulation and reconstruction
The phantoms described above were simulated in the proton CT simulation and reconstruction framework described in [22] using the manufacturer specifi cations for composition and density. Each phantom was scanned using a uniform fi eld of 250 MeV protons with 0.5 MeV energy spread at a CT equivalent dose index (CTEDI) of 10 mSv [22]. We simulated a detector spatial resolution of 200 μ m and energy resolution equivalent to max(0.03E, 24.15MeV 2 /E ϩ 1.76MeV) where E is the incident proton energy [23]. to 7.4% for SECT and 1.0% for DECT. For DECT, the results were consistently within 2 σ of ground truth. This was also true for proton CT except in the case of cortical bone, which showed a minor underestimation 0.4 Ϯ 0.2%. Figure 2 presents the results for the second phantom, which was not used in x-ray CT calibration. The RMS error for proton CT was 0.3% whereas for stoichiometric SECT calibration, phantom SECT calibration and DECT they were 1.6%, 3.6% and 0.9%, respectively. The maximum error for proton CT was 0.5% compared with 11% for SECT and 1.7% for DECT.
For proton CT, all results were within 2 σ of the ground truth, but both DECT and SECT showed statistically signifi cant deviations. The uncertainty from the theoretical calculation of stopping power was found to be σ theory ϭ 0.49%.

Discussion
The results presented in Figures 1 and 2 are in agreement with published literature. The traditionally quoted value of 3.5% uncertainty on stopping power for SECT [8] corresponds to the RMS errors reported here, which ranged from 1.6% to 3.7%, depending on the calibration procedure used. For DECT, RMS error for the Gammex phantom was 0.5% with a maximum error of 1%. These results show excellent agreement with the accuracy levels reported in other studies [15,17] and are of the order of the uncertainty estimated in this work for

Accuracy evaluation
In all cases, the ground truth stopping power was calculated at 100MeV based on either the stoichiometric information provided by the phantom manufacturers. The mean excitation potentials, required for stopping power calculation, were based on the recommendations in [24]. We estimated the uncertainty on the ground truth calculations using data from [17] where stopping power for inserts estimated using a water column was reported along with the manufacturer ' s reported composition.
A more complete description of the methods listed above is presented in the Materials and Methods part of the Appendix and calibration curves for DECT and SECT are shown on Supplementary  Figure 1 (available online at http://informahealth care.com/doi/abs/10.3109/0284186X.2015.1061212). Figure 1 presents the comparison of proton CT stopping power accuracy for the Gammex phantom compared to SECT and DECT results. For this phantom the average root mean square (RMS) errors for proton CT was 0.2%. In comparison the stoichiometric SECT calibration yielded a RMS error of 2.7%, and the phantom calibration 1.6%. DECT RMS error was 0.5%. The maximum absolute error for proton CT was 0.4%, compared to errors of up the ground truth. The DECT performance, when evaluated outside of calibration conditions by scanning different materials in a smaller phantom, was found to be slightly degraded with an RMS error of 0.9% with maximum errors of up to 1.7%. The effect of using different phantoms had not been reported previously.

Results
Our results suggest that DECT accuracy remains below 2% across phantoms, but it may be necessary to perform a systematic investigation of the range of patient sizes where a given DECT calibration holds.
The main objective of this work was to assess whether proton CT offers any advantage over clinically available stopping power estimation methods. The stopping power accuracies reported here for the simulated proton CT showed maximal errors of 0.5%. Comparing with literature, Zygmanski et al. [25] achieved an agreement better than 0.5% when compared with experimental stopping power measurements, although the setup for proton CT was somewhat different than the one used in this study. In a later study, Hurley et al. [26] also achieved 0.5% accuracy in an experimental proton CT of a water phantom.
Minimum stopping power accuracy requirements for proton therapy treatment planning are not easily distilled from the literature and are essential in identifying whether the potential accuracy gains from proton CT are relevant. If we assume a requirement of a 1% uncertainty on range then the DECT results presented here fail to provide this level of accuracy for all materials, while proton CT shows promise. If we aim for 2% uncertainty then the use of DECT would be suffi cient and the benefi ts of the development of proton CT may be questionable. It may be necessary to perform proton dose calculation on clinically relevant geometries imaged with the methods discussed in this work, to establish whether signifi cant range errors exist.
Proton CT does not suffer from the artifacts present in x-ray CT, such as beam hardening, making it more robust to variations in patient size. Proton CT also has the advantage that it would image the patient directly in the treatment position, which could reduce setup errors. However, development of in situ dual energy cone beam CT scanners may provide similar capability.
The two largest advantages of x-ray CT over current prototype proton CT are scanning speed and spatial resolution. Modern x-ray CT scanners allow for time-resolved scans, which in the context of radiotherapy is particularly important for moving targets, e.g. in the lung. Older studies in proton CT report quite poor resolution [6], worse than 3 mm, which would be prohibitive in radiotherapy treatment planning. However, recent work has shown that proton CT in theory can achieve similar spatial resolution to x-ray CT by using path modeling [27], but this has yet to be tested experimentally.
In conclusion, in this study we have confi rmed previously reported stopping power accuracy gains from DECT when using a single phantom for calibration and evaluation and identifi ed that when deviating from calibration conditions by using a phantom of different size and composition accuracy can be slightly degraded. Simulation of proton CT of the same phantoms showed that in theory proton CT may offer superior accuracy than DECT. Whether proton CT is necessary is highly dependent on the clinical requirements for stopping power estimation.