Performance evaluation of head and neck contour adaptation with cone beam CT using two commercial software systems.

1Department of Medical Physics, Crown Princess Mary Cancer Centre, Westmead Hospital, Westmead, Sydney, NSW, Australia, 2Department of Medical Physics, Nepean Cancer Care Centre, Nepean Hospital, Sydney, NSW, Australia, 3Institute of Medical Physics, School of Physics, University of Sydney, Sydney, NSW, Australia and 4Department of Medical Physics, Blacktown Cancer Care Centre, Blacktown Hospital, Sydney, NSW, Australia

To the Editor, radiotherapy is one of the main treatment techniques for head and neck cancer and is generally delivered in daily fractions, five times a week over 6-7 weeks [1]. During this time, there is often a change in anatomy due to weight loss, tumour shrinkage or organ motion/deformation. Previous studies have found a reduction in parotid gland volumes of 16-24% [2,3]. several studies have investigated the use of adaptive radiotherapy (Art), during which the treatment plan can be adapted to account for changes seen during the treatment [4,5]. However, Art can be a time consuming activity and as such difficult to implement into a busy clinical department due to increased workload [6]. Deformable image registration (Dir) is a tool which could potentially improve the efficiency of Art. Dir determines the spatial correspondence between points in a reference image [the planning computed tomography (Ct)] and the target image (the replan Ct). this allows the contours originally delineated on the planning Ct to be deformed to the new anatomy in the rescan Ct.
the use of additional Ct scans for Art is well documented [4,5,7]. in future, cone beam Ct (CBCt) scans which are acquired routinely for image guidance at the time of beam delivery could be used to compliment, or replace, a second Ct scan. this would involve the recalculation of dose and dose-volume histograms (DvH) on CBCts as part of the process to decide if and when plan adaption is required in order to meet initial plan goals [4,5,7]. the aim of this study was to measure the accuracy of Dir transfer of contours from a planning Ct to both a rescan Ct and a CBCt with two commercial Dir systems.

Material and methods
twenty head and neck radiotherapy patients who were previously treated at our institution were randomly selected for this retrospective study. the treatment sites were predominantly early stage node positive oropharynx and nasopharynx (supplementary material, supplementary table i available online at http://informahealthcare.com/doi/abs/10.3109/02 84186X.2015.1068448). All patients were treated radically, with a prescribed dose of 66-70 gy in 33-35 fractions.
Each patient had a planning (Ct1) scan and a prospective rescan (Ct2) approximately half way through their 30-33 fraction treatment, at an average of 46 days after Ct1 as per clinical protocol (both Cts were acquired with iv contrast). Each patient underwent a CBCt scan (without iv contrast) approximately once a week throughout their treat-ment. scanning parameters and further information can be found in the supplementary material available online at http://informahealthcare.com/doi/abs/ 10.3109/0284186X.2015.1068448. the organs at risk (OAr) (listed in table i) were contoured on the Ct1, Ct2 and CBCt scans by a single radiation oncologist (rO) in the Eclipse tm treatment Planning system (varian medical systems, Palo Alto, CA, UsA). the images and contours were then exported to the Dir software. the rO's contours were taken as the benchmark, allowing comparison with the deformed contours. to ensure consistency and avoid variability due to multiple observers, the same rO performed all contouring. to assess the intra-observer variability in contouring, five randomly selected patients were re-contoured by the same rO at least four months after the initial contouring took place.
two commercially available Dir software packages were used in this study to determine the deformation between Ct1 and Ct2, and Ct1 and CBCt and deform the structures accordingly. these were mim tm maestro (version 6.4.3, mim software inc., Cleveland, OH, UsA) (mim) and velocity Ai tm (version 3.1.0, varian medical systems).
Ct1 was registered deformably to Ct2, and the rO contours from Ct1 transferred to Ct2 and compared to the rOs contours delineated on Ct2. this was performed independently on both software systems and repeated using CBCt.
the rO and Dir produced contours were compared using the volume ratio (ratio of rO volume/ Dir produced volume, vr), maximum, mean and standard deviation of Haussdorff distance (HD, which is an unsigned distance to agreement), signed distance to agreement (DtA) and normalised Dice similarity coefficient (nDsC) [8,9]. Details of DtA and nDsC ues of HD ranged between 2.7 and 15.6 mm for Ct2 and 3.0 and 17.4 mm for CBCt. While the increased range of vr and DtA for CBCt was expected due to the decrease in image quality inherent in CBCt compared to Ct, a paired t-test showed there was no statistically significant difference in mean values (p  0.05) between the Ct2 and CBCt results for individual OArs.
the mean of the standard deviation of DtA (sRO) in rO contouring (overall OArs and all patients) was calculated to be 1.8 mm for the Ct and 1.6 mm for CBCt (table i).
the standard deviation of DtA for Dir-produced contours (sDir) was similar to that seen for the variation in rO contouring (sRO), although slightly larger in most cases ( Figure 1, table i). Across the two software systems DtA (sDir) for velocity Ai was found to be greater than for mim with Ct-Ct (p  0.05, paired Wilcoxon signed rank), but there was no significant difference with Ct-CBCt.
the mean nDsC was found to be 0.94 and 0.86 for mim and velocity Ai, respectively, for Ct-Ct Dir, with overall 34% and 7% of contours having nDsC  1 (supplementary table iv, available online at http://informahealthcare.com/doi/abs/10.3109/02 84186X.2015.1068448). the percentage of individual OArs with nDsC  1 was 26% and 29% for Ct1-Ct2 and Ct1-CBCt respectively for mim, whilst for velocity Ai this was 11% and 19%. mim had a statistically significantly higher nDsC than velocity Ai (p  0.05, paired Wilcoxon signed rank) for Ct-Ct but there was no significant difference for Ct-CBCt (Figure 2). Across all Dir techniques, the range of standard deviation of volume, DsC and DtA were 0.0-4.3 cm 3 , 0.0-0.2 and 0.0-7.5 mm respectively, for the five calculation can be found in the supplementary material available online at http://informahealthcare.com/ doi/abs/10.3109/0284186X.2015.1068448.

Results
Fifteen of the 20 patients lost 2 kg or more during their treatment (mean 5%, range of 16 - 3%), leading to clinically significant anatomical changes (supplementary table ii  image sets that had Dir performed five independent times.

Discussion
in this study, manual contouring of OArs has been used to measure the uncertainties in the Dir for the mim and velocity Ai systems using both Ct and CBCt images. the Dir techniques were automated to investigate the implementation of automated Dir using a commercial registration system. to our knowledge, no other study has investigated this. Previous studies have discussed the errors associated with contouring by rOs [10,11] and efforts were taken to minimise this by using a single rO. in terms of the metrics used in this study, Zidenbos et al. found a DsC  0.7 to indicate good agreement [12] while Walker et al. and mattiucci et al. considered a DsC  0.85 and DsC  0.8 to be clinically useful [6,13]. For the pairs of rO contours on Ct2 and CBCt, 84% and 91%, respectively, had a DsC of 0.7 or greater, whereas this dropped to just 36% at a DsC of 0.85 for both Ct2 and CBCt. However, the calculation of nDsC which factors in the uncertainty in rO contouring aimed to give some clinical value to the metrics obtained. Based on both nDsC and DtA measurements it is clear that Dir does not perform within rO contouring uncertainty for most OArs. Further investigation in terms of examination of contours by the rO would be required to determine whether the results of Dir using the two software packages would be clinically acceptable.
A limitation of this study was that only OAr contours were available to assess Dir accuracy as the study was performed as part of a clinical study only requiring the organs listed in table i. ideally, gtv and Ctv contours would have also been available.
For adaptive planning with Ct, the intended purpose of Dir would be to automatically contour structures on CBCt scans for which dose has been calculated using the current treatment plan and to then assess DvHs for acceptability against the plan objectives. With highly conformal dose distributions small contouring errors could have a large impact on the decision or conversely large errors may have only a small impact, as seen by Brouwer et al., who reported large NtCP differences exceeding 10% for a small minority of patients when investigating different contouring guidelines.
A limitation of this study was that the contour comparison metrics did not indicate whether there would have been a different decision on whether to adapt or not comparing the rO and Dir contours. this is the subject of planned further investigation.
in conclusion, for the two Dir systems evaluated in this study, automatic contours adapted from Ct1 to Ct2 using Dir were found to have an uncertainty of similar size to the inherent intra-clinician uncertainty, for the majority of OArs. mim was found to have smaller variations than velocity Ai for Ct1 to Ct2 adaption. the mim system was found to have increased variations for Ct1-CBCt adaption which were similar in size to variations found with velocity Ai. these results suggest that automated Dir has the potential for use for transforming contours from Ct1-Ct2 with some editing required, but may be limited for determining whether replanning is required using CBCt images.