Diagnostic accuracy of endoscopic ultrasound and intraductal ultrasonography for assessment of ampullary tumors: a meta-analysis

Abstract Background Accurate preoperative assessment of ampullary tumors (ATs) is critical for determining the appropriate treatment. The reported diagnostic accuracy of endoscopic ultrasound (EUS) and intraductal ultrasonography (IDUS) for detecting tumor depth (T-staging) and regional lymph node status (N-staging) varies across studies. Method An electronic search of the MEDLINE and Embase databases was conducted to identify studies that assessed the diagnostic accuracy of EUS and IDUS for ATs. Sensitivities and specificities of eligible studies were summarized using either fixed effects or random-effects model. Results Twenty-one studies were included in the final analysis. The pooled sensitivity and specificity of EUS were 0.89 and 0.87 for T1, 0.76 and 0.91 for T2, 0.81 and 0.94 for T3 and 0.72 and 0.98 for T4, respectively. For IDUS, estimates from five studies were 0.90 and 0.88 for T1, 0.73 and 0.91 for T2 and 0.79 and 0.97 for T3, respectively. For N-staging, 16 studies using EUS were included with sensitivity and specificity of 0.61 and 0.77, respectively. Moreover, estimates of IDUS for N-staging were 0.61 and 0.92, respectively. Conclusion Our results imply that EUS and IDUS have good diagnostic accuracy for T-staging of ATs. However, the accuracy of EUS or IDUS is less satisfactory for N-staging. More well-designed prospective studies are warranted to confirm our findings.


Introduction
Ampullary tumors (ATs) originate from the ampulla of Vater itself, distal to the bifurcation of the distal common bile duct and the pancreatic duct [1]. ATs have been increasingly diagnosed over the last decades, due to the wide use of endoscopic and radiological modalities for unrelated or other indications [2].
The removal of ATs is recommended in most cases because of its malignant attribute, especially when symptoms are present [1]. However, radical surgery carries high mortality rates ranging from 0 to 13% and high morbidity rates ranging from 25 to 63% [3]. In contrast, endoscopic ampullectomy appears to be a valid alternative to surgery for ATs. Despite the high rate of radical resections and low recurrence rate, the incidence of adverse events such as pancreatitis and hemorrhage should not be neglected [4]. A meta-analysis have showed that the overall rate of adverse events was up to 24.9% [5]. Therefore, an accurate preoperative assessment of ATs is crucial for triage of patients to endoscopic or surgical treatment.
The recent published European Society of Gastrointestinal Endoscopy (ESGE) guideline recommended endoscopic ultrasound (EUS) and intraductal ultrasonography (IDUS) for locoregional staging of ATs with low quality of evidence [6].
The main advantage of EUS is that the transducer can be placed close to the lesion without interference of fat, bowel gas or bone. IDUS provides real-time, high-quality cross-sectional images, and previous studies indicated that IDUS offers diagnostic yields that are equivalent to or slightly greater than those of EUS [7,8].
Various studies have evaluated the diagnostic accuracy of EUS or IDUS in endosonographic evaluation of T-and N-staging of ATs. However, results from these studies vary considerably. To our knowledge, a meta-analysis published in 2014 investigated the accuracy of EUS alone in ATs and found that EUS had a moderate strength of agreement with histopathology in determining T-and N-staging [9]. Since almost a decade has passed, new studies concerning the diagnostic accuracy of EUS or IDUS have been published, adding new information to the body of evidence. Thus, we conducted a systematic review and meta-analysis to update the current evidence.
Diagnostic Test Accuracy Working Group to perform this meta-analysis [10]. Two investigators (Ye XH and Wang L) independently performed a computerized search of MEDLINE (from 1 January 1966 to 31 December 2021) and Embase (from 1 January 1974 to 31 December 2021) databases to identify potentially relevant articles. The search was carried out using the following keywords: endosonography ('endoscopic ultrasound', 'EUS', 'intraductal ultrasonography' and 'IDUS'), ampullary ('ampulla' and 'papilla') and tumor ('malignancy', 'neoplasm', 'cancer' and 'adenoma'). Manual searches of the bibliographies from these potential articles were also performed to identify additional studies.

Study selection
Two investigators (Ye XH and Wang L) independently reviewed potentially relevant articles for eligibility and inclusion. Studies were included if they met the following inclusion criteria: (1) Retrospective or prospective design published in manuscript form; (2) studies involving 10 or more patients using EUS or IDUS to evaluate ATs; (3) an appropriate reference standard was reported (endoscopic or surgical pathology); (4) reported absolute numbers of truepositive, false-negative, true-negative and false-positive observations for ATs, or if sufficient data could construct a 2 Â 2 contingency table and (5) ATs were evaluated according to Tumor Node Metastasis (TNM) classification [11]. Case reports, editorials, review articles or clinical guidelines were excluded. Any disagreements were resolved by consensus.
For T-staging, T1 refers to lesion limited to the ampulla of Vater or sphincter of Oddi; T2 refers to invasion of the duodenal muscularis propria/duodenal wall; T3 refers to invasion of the pancreas and T4 refers to invasion of the peripancreatic soft tissue or adjacent organs or structures other than the pancreas. For N-staging, regional lymph nodes were defined as N1 if there were malignant regional lymph nodes on surgical pathology and N0 if no malignant regional lymph nodes were detected.

Data extraction and quality assessment
A custom-made standardized form was used for data extraction. For each eligible study, the following data were extracted: surname of first author, publication year, region of the study population, study design, sample size (the number of patients with ATs), details of the endosonographic type (EUS or IDUS, radial or linear) and reference standard (surgical or endoscopic pathology).
The methodological qualities of the studies were assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool [12]. This tool consists of four key categories: patient selection, index test, reference standard and flow and timing. Each category was assessed in terms of risk of bias, and the first three categories were also considered in terms of applicability.

Data synthesis and statistical analysis
The 2 Â 2 tables (numbers of true-positive, false-negative, true-negative and false-positive) were constructed based on the data of the including studies. The pooled sensitivity and specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR) and diagnostic odds ratio (DOR) were calculated [13]. The summary estimates of sensitivities and specificities along with their corresponding 95% confidence intervals (CI) and prediction region were presented with a summary receiver-operating characteristic curve. Moreover, the area under the curve (AUC) was calculated [14]. Most clinical tests have an AUC value between 0.5 and 1.0, with a better diagnostic performance correlating with an AUC closer to 1.0 [15].
Heterogeneity was assessed using the Q-statistic and quantified using I 2 . For the Q test, p < .10 was considered to imply statistical heterogeneity. I 2 is the proportion of total variation contributed by between-study variation. Deek's test was used to evaluate publication bias [16]. All statistical analyses were carried out using Meta-Disc (version 1.4; Unit of Clinical Biostatistics, Ramony Cajal Hospital, Madrid, Spain) and STATA software (version 12.0; College Station, Texas, USA). A p value less than .05 was considered statistically significant.
Baseline characteristics of the included studies are summarized in Table 1. The eligible studies were published between 1988 and 2019 and were performed in 12 regions. The sample size in each study ranged from 12 to 120 and included a total of 736 cases. Based on the QUADAS-2 tool, study bias and applicability outcomes were assessed, and the results are shown in Figure 2. Thirteen of the 21 studies were judged as high or unclear risk in one or more of the four key categories.
With regard to IDUS, five studies were included for evaluation of T-staging [8,22,26,28,33]. The summary results of sensitivity and specificity were 0.90 (n ¼ 5, 95%CI 0.82 À 0.95) and 0.88 (95%CI 0.78 À 0.95) for T1, 0.73 (n ¼ 5, 95%CI 0.54 À 0.88) and 0.91 (95%CI 0.85 À 0.95) for T2 and   Figure 5 show the sensitivity and specificity of IDUS in diagnosing various T stages. The AUC curves of IDUS for T-staging are shown in Figure 6. A test of heterogeneity for all the pooled estimates of IDUS had a p value > .10 expect for T2 in specificity. PLR, NLR and DOR of IDUS for various T stages are shown in Table 3. Furthermore, two studies included T3 and T4 stage tumors in the same group and therefore additional analyses for T3-4 were performed [28,33]. The summary results of sensitivity and specificity were 0.80 (n ¼ 2, 95%CI 0.44 À 0.97) and 0.91 (95%CI 0.82 À 0.96) for EUS and 0.80 (n ¼ 2, 95%CI 0.44 À 0.97) and 0.93 (95%CI 0.84 À 0.97) for IDUS, respectively. The heterogeneity was significant for EUS and IDUS in specificity (p < .05). The AUC, PLR, NLR and DOR values of EUS and IDUS to diagnose T3-4 stage of ATs are shown in Tables 2 and 3.

Assessment of heterogeneity
Heterogeneity was mainly found in sensitivity (T1) and specificity (T1, T2, T3 and N-staging) of the summarized EUS results. Subgroup analyses were performed based on publication year (before 2000 vs. after 2000), area (eastern countries vs. western countries), EUS technique (radial EUS only vs. radial or linear EUS) and study design (retrospective vs. prospective). For T1, the heterogeneity in sensitivity was attributed to EUS technique and study design. The heterogeneity in specificity (T1, T2, T3 and N-staging) was eliminated or reduced to varying degrees when subgrouping based on publication year, area, EUS technique and study design, suggesting that these factors might be significant contributors (Supplementary Table 1).

Publication bias
The publication bias was assessed using Deek's funnel plot. If exists, publication bias results in a higher proportion of smaller studies with bigger effect sizes compared to larger ones. In the funnel plot, the vertical axis represents the inverse of the square root of the effective sample size, while the horizontal axis represents the DOR. With the exception of EUS for T4 (p < .01), all the other Deek's funnel plots were symmetrical with respect to the regression line, and   asymmetry tests revealed no evidence of publication bias. The funnel plots to investigate the effect of publication bias for EUS and IDUS estimating T-and N-staging of ATs are shown in Figure 9.

Discussion
In this study, we conducted a robust systematic review and an appropriately performed meta-analysis. Our main findings were that both EUS and IDUS had acceptable sensitivities (0.72 À 0.89 for EUS; 0.73 À 0.90 for IDUS) and specificities (0.87 À 0.98 for EUS; 0.88 À 0.97 for IDUS) in diagnosing T-staging of ATs, whereas the accuracy of EUS or IDUS is less satisfactory for N-staging. The AUC values of EUS (0.89 À 0.95) and IDUS (0.87 À 0.95) for T-staging were very close to 1, indicating that both EUS and IDUS are excellent T-staging tests for ATs. With regarding to Nstaging, the summarized results suggest that either EUS (sensitivity, 0.61; specificity, 0.77; AUC, 0.74) or IDUS (sensitivity, 0.61; specificity, 0.92; AUC, 0.87) is suboptimal. In consideration of intraductal extension as a predictor for incomplete endoscopic resection and recurrence [34,37], data were also collected. According to our synthesized results, EUS has been shown to be useful in evaluating intraductal extension of ATs (sensitivity, 0.79; specificity, 0.88; AUC, 0.92).  To the best of our knowledge, this is the first meta-analysis that quantitatively summarizes all the available evidence of both EUS and IDUS in the locoregional staging of ATs. A previous systematic review was published in 2014, which included 14 studies with respect to the diagnostic accuracy of EUS alone on ATs [9]. In our study, we included 21 studies and strengthened the body of evidence. Several included studies reported results for T-staging as 'T3-4', which precluded data extraction for T-staging. However, the prior systematic review considered 'T3-4' as T3 and T4 separately Figure 9. Funnel plots assessing bias for T-and N-staging of EUS and IDUS for ATs. EUS: endoscopic ultrasound; IDUS: intraductal ultrasonography; ATs: ampullary tumors. [28,33]. The main strength of our meta-analysis is that we performed additional analyses of studies that reported Tstaging as 'T3-4' and found that the performance of EUS and IDUS to diagnose T3-4 tumors was comparable with other results, hence improving the estimate accuracy.
The incidence of ATs has increased in clinical practice due to the development of routine screening endoscopic procedures and imaging modalities [2]. Consequently, accurate locoregional assessment is of great importance for selecting optimal treatment modality. The ESGE guideline published in 2021 has recommended EUS and IDUS for locoregional staging of ATs; however, the quality of evidence was low [6]. In addition, ESGE guideline also stipulated that other imaging modalities such as abdominal magnetic resonance cholangiopancreatography (MRCP) for staging of ATs. Generally, various imaging technologies including CT scan, MRCP and transabdominal ultrasound are traditionally used combined with EUS for preoperative staging of ATs in clinical practice. Indeed, several studies suggest that EUS provides significantly higher performance specifically for T-staging compared with CT and transabdominal ultrasound, whereas comparable or slightly higher accuracy compared with MRCP with no statistical significance [17,18,24,25,38]. Lymph node metastasis is a well-established prognostic predictor for ATs [36]. Although the accuracy is not as reliable as that of Tstaging, EUS and IDUS can still help clinicians stratify the risk of patients with lymph node metastatic disease and represent a clue in selecting patients for optimal treatments. In addition, studies have demonstrated that the performance of EUS was not statistically different as compared to MRCP and CT [22,30,31,33]. Recently, several reports have described EUS-guided fine-needle aspiration (FNA) for ATs and it might be another diagnostic option with a sensitivity of 82.4% and a specificity of 100% [39,40].
IDUS uses higher frequency ultrasound probe (20 À 30 MHz) and thus produces higher-resolution images than EUS [41]. IDUS provides superior differentiation between the sphincter of Oddi and the duodenal wall because less tissue is compressed during scanning. Previous studies demonstrate that IDUS has slightly higher diagnostic yields than or comparable to those of EUS [7,8] and is helpful in selecting appropriate patients for indication of endoscopic ampullectomy [28]. Moreover, as the guideline recommended, routine use of IDUS should weigh against training, costs and risk of pancreatitis [6]. However, the number of included cases with ATs undergoing IDUS was limited, and larger series with longer follow up are imperative for exploring its clinical significance.
Our data carry clinical implications. As shown in Tables 2  and 3, DOR, PLR and NLR were calculated. DOR refers to the odds of having a positive test in patients with a true histological stage of the disease when compared with patients who do not have the disease. For instance, the odds of having the correct histological T1 stage of ATs is 28.31 times. It enables physicians determining treatment strategy with confidence. PLR defines as a measure of how well the test correctly identifies a disease, whereas NLR is a measure of how well the test correctly excludes a disease [42]. Comparatively speaking, likelihood ratios including PLR and NLR are supposed to be more clinically practical. The results indicate that both EUS and IDUS perform well in excluding as well as diagnosing the correct T stage of ATs.
Some limitations of our study merit consideration. First, most of our studies were retrospective in design, thereby may overestimating the diagnostic precision. Also, some of the studies did not specifically differentiate benign ampullary adenomas from ampullary cancers. Third, the definition of lymph node metastasis varied across studies, thereby leading to selection bias. Fourth, for IDUS results, there was only a small number of studies and cases included to draw a robust conclusion. Therefore, the diagnostic performance of IDUS for ATs might be less reliable. The resolution of this issue will require more data and additional studies. Last, it is well known that intraobserver variability exists in EUS and IDUS interpretation and thus may affect the accuracy of our analyses.
In summary, EUS and IDUS are highly accurate techniques in T-staging and intraductal extension for ATs. However, EUS and IDUS are less satisfactory for N-staging due to their modest sensitivities and specificities. The results of the test must be interpreted with caution in specific clinical contexts. More prospective, well-designed studies are demanded.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by Natural Science Foundation of Zhejiang Province (LQ19H030003) and Key Project of Jinhua Science and Technology Bureau (2018A32022).