Comparing Common Retinal Vessel Caliber Measurement Software with an Automatic Deep Learning System

Abstract Purpose To compare the Retina-based Microvascular Health Assessment System (RMHAS) with Integrative Vessel Analysis (IVAN) for retinal vessel caliber measurement. Methods Eligible fundus photographs from the Lingtou Eye Cohort Study were obtained alongside their corresponding participant data. Vascular diameter was automatically measured using IVAN and RMHAS software, and intersoftware variations were assessed by intra-class correlation coefficients (ICC), and 95% confidence intervals (CIs). Scatterplots and Bland–Altman plots assessed the agreement between programs, and a Pearson’s correlation test assessed the strength of associations between systemic variables and retinal calibers. An algorithm was proposed to convert measurements between software for interchangeability. Results ICCs between IVAN and RMHAS were moderate for CRAE and AVR (ICC; 95%CI)(0.62; 0.60 to 0.63 and 0.42; 0.40 to 0.44 respectively) and excellent for CRVE (0.76; 0.75 to 0.77). When comparing retinal vascular calibre measurements between tools, mean differences (MD, 95% confidence intervals) in CRAE, CRVE, and AVR were 22.34 (–7.29 to 51.97 µm),–7.01 (–37.68 to 23.67 µm), and 0.12 (–0.02 to 0.26 µm), respectively. The correlation of systemic parameters with CRAE/CRVE was poor and the correlation of CRAE with age, sex, systolic blood pressure, and CRVE with age, sex, and serum glucose were significantly different between IVAN and RMHAS (p < 0.05). Conclusions CRAE and AVR correlated moderately between retinal measurement software systems while CRVE correlated well. Further studies confirming this agreeability and interchangeability in large-scale datasets are needed before softwares are deemed comparable in clinical practice.


Introduction
The retina is a neurovascular tissue located on the posterior surface of the eye and is the only microvascular structure within the human body that can be noninvasively imaged to assess cardiovascular health. Previous studies have demonstrated that changes in retinal microvascular caliber can be obtained by analyzing fundus images, and certain characteristics are associated with systemic diseases. For example, wide retinal venules are associated with a higher risk of diabetes, and retinal arteriolar narrowing may precede the development of hypertension. [1][2][3][4][5][6] Over the past two decades, a series of semi-automated software systems were developed to objectively measure the diameter of fundus vessels, including Integrative Vessel Analysis (IVAN) and Singapore I Vessel Assessment (SIVA). Although these programs considerably reduce the burden of manual labor, several studies have found poor agreement between them, which may be due to the limited "objectivity" of the measurements by the operators. Only one study has shown good to excellent agreement between IVAN and SIVA. [7][8][9] As population-based analysis of retinal fundus images becomes more popular, the lack of interchangeability of these programs is problematic and inhibits the generalizability, pooling and meta-analysis of studies.
Deep learning has recently been applied for the evaluation of retinal vascular calibers and its association with various diseases. [10][11][12] With artificial intelligence technologies improving, automated analysis of retinal fundus photographs provides an unprecedented opportunity to save on the cost of resources while also being free from observer bias and fatigue. 13 Shi et al. developed and validated the automated deep learning system, Retina-based Microvascular Health Assessment System (RMHAS), with intentions to apply this to high-throughput retinal vessel analysis. 14 It holds benefits over previous software by fully automating the retinal analysis process, allowing for fundus images to be processed in <2 s. Although it proved to have good performance, its agreement with other software have not yet been investigated. Therefore, this study compared the agreement of RMHAS with a popular semi-automated analysis software, IVAN, for retinal vascular parameter analysis.

Study population
This study included samples from the Lingtou Eye Cohort Study, an ongoing prospective study that enrolled government workers who underwent annual physical and eye examinations from 2009-2010 at the Guangzhou Civil Service Physical Examination Center. Detailed methods and protocol for examinations can be found elsewhere 15,16 but in brief, participants were enrolled from 2009-2010 if they were 40 years of age or older and had no history of major cardiovascular events. Baseline assessments included physical and eye examinations, and structured face-to-face interview with questionnaire. All in all, a total of 6846 fundus photographs from 4205 subjects were included.
Written informed consent was obtained from all participants during recruitment. The study was approved by the Zhongshan Ophthalmic Center Institutional Review Board (identifier, 2017KYPJ049), and the study adhered to the tenets of the Declaration of Helsinki. Patients with missing data and no fundus photographs were excluded.

Retinal photography
Nonmydriatic standard digital fundus photographs, centered on the optic disc, were captured using a fundus camera (TRC-NW6S; Topcon, Tokyo, Japan). The images of the included subjects were graded using IVAN (shown in Figure  1(A)) and RMHAS (shown in Figure 1(B)), after retrieving the fundus photographs from the study dataset. The grading procedures are described in detail below.

Integrative vessel analysis
Retinal vessels from fundus photos were graded according to the ARIC study classification. 17 In brief, a standardized ARIC grid (calibrated to a fixed size based on the camera resolution) was manually centered on the optic disc by a trained technician, and blood vessels in a specified area (0.5-1.0 optic disc diameters from optic disc edge) were automatically traced and identified. The traced vessels were examined by a trained grader, who manually corrected them if necessary. Based on the modified Knudtson-Parr-Hubbard formula, retinal arteriolar and venous calibers were summarized as central retinal artery equivalent (CRAE) and central retinal vein equivalent (CRVE), respectively. 18 Artery-to-vein ratio from equivalents (AVR) was obtained by dividing the CRAE by CRVE.

Retina-based microvascular health assessment system
The fundus images were cropped to the field of view (FOV) and resized to 512 Â 512 pixels before input. RMHAS measures retinal vascular morphology using custom region-specific summaries and global physical/geometric parameters. Vessel caliber was summarized as the CRAE and CRVE for the six largest arteries and veins detected in the standard area based on the modified Knudtson-Parr-Hubbard formula. 19 The AVR was calculated by dividing the CRAE by the CRVE. Since the fundus images were cropped to a fixed size before input into RMHAS, we multiplied the obtained AVR values by a corresponding magnification scale (6.3) to ensure comparability with IVAN measurements. 17,19 Systematic variables The demographic characteristics and medical history of participants were obtained through standardized questionnaires and medical records. Blood pressure was measured by trained medical workers using an automatic upper-arm BP monitor (HBP-9020; OMRON, Osaka, Japan). Participants' height and weight were obtained while they were wearing light clothing and no shoes using an automated height and weight scale (HNH-318; Omron). BMI was calculated by dividing weight in kilograms by the square of the participant's height in meters. Venous blood was drawn for serum glucose and lipids based on standardized protocols.

Statistical analysis
Statistical analyses were conducted using Stata V.15.0 software (StataCorp, College Station, TX, USA) and Python 3.6. Intersoftware variations in the retinal vessel measurements were estimated based on the intra-class correlation coefficients (ICC), and the 95% confidence intervals (CIs) of the ICC were calculated. We used the two-way mixed effects model, which is commonly used to assess the reliability of measurements made by multiple raters or instruments, to calculate ICC. In this model, the ICC is calculated by dividing the between-subjects variance by the sum of the between-subjects variance and the within-subjects variance. The formula for ICC in the two-way mixed effects model is as follows: MSR is the mean square of the variance between the raters or instruments, MSE is the mean square of the variance within each rater or instrument, and k is the number of measurements per subject. Agreement of the ICCs was interpreted using the following scale: 0.00-0.39 ¼ poor; 0.40-0.69 ¼ moderate; and 0.70-1.00 ¼ excellent. 20 To evaluate the agreement between the software programs, we utilized scatterplots and Bland-Altman plots to provide a visual representation. The 95% limits of agreement (LOA) were determined as the mean difference ± 1.96 times the standard deviation (SD). 21 A sample t-test (comparing mean difference and zero value) indicated the presence of systematic bias. Pearson's correlation analysis was conducted between the difference and the average to indicate the presence of proportional bias. For the RMHAS-derived CRAE and CRVE values, we used paired t-tests to determine if there were significant differences between them and the IVAN measurements.
A Pearson's correlation test examined the strength of association between each systemic variable with retinal calibers for each software using IVAN and RMHAS (denoted as correlation coefficients R1 and R2, respectively). To compare the correlation coefficients between R1 and R2 a Z-test for each systemic variable was conducted. It is a hypothesis testing procedure where the null hypothesis is that there is no significant difference between the sample mean and the population mean, and the alternative hypothesis is that there is a significant difference. In the Z-test, a P value <0.05 denoted statistical significance between the two R-values, indicating the strength of associations was different in both compared models.

Conversion algorithm
To develop an algorithm for converting retinal vessel caliber measurements from RMHAS into measurements from IVAN software, we used 80% (n ¼ 5477) of the total 6846 fundus photographs as the training set and the remaining 20% (n ¼ 1369) as the validation set.
The IVAN measurements were used as the outcome, and an algorithm was constructed using a linear regression method. In the training set, the coefficients of the RMHAS caliber variable were obtained from the regression model and used to construct the transformation algorithm.

Results
A total of 6846 fundus photographs from 4205 subjects were included. The mean age of participants was 58.88, and 2485 (59.10%) were male. The baseline characteristics of the participants are listed in Table 1. Figure 2(A-C) illustrates the correlation between CRAE, CRVE, and AVR measurements obtained by RMHAS and IVAN, respectively. Mean absolute values, standard deviations, and ICCs for CRAE, CRVE, and AVR measurements obtained by IVAN and RMHAS are presented in Table 2. The ICCs between IVAN and RMHAS were moderate for CRAE and AVR (ICC; 95%CI)(0.62; 0.60 to 0.63 and 0.42; 0.40 to 0.44, respectively) and excellent for CRVE (0.76; 0.75 to 0.77).
The results of comparing retinal vascular calibre measurements between RMHAS and IVAN are presented in Table 3. The mean differences (95%CI) in CRAE, CRVE, and AVR were 22.34 (-7.29 to 51.97 mm),-7.01 (-37.68 to 23.67 mm), and 0.12 (-0.02 to 0.26 mm), respectively. Systematic bias was observed as the one-sample t-tests comparing MD with a zero value were significant (p < 0.001). Furthermore, systematic and proportional bias was observed for all retinal vascular parameters (p < 0.001). Proportional bias was observed in CRAE, CRVE, and AVR due to the significant correlation between differences and means (p < 0.001 for all), as shown in Table 3. Bland-Altman plots of agreement confirm this finding (Figure 3(A-C) for CRAE, CRVE and AVR, respectively).
Considering these considerable differences, a conversion algorithm for IVAN and RMHAS was proposed: IVAN approximation calculation algorithm derived from RMHAS:  in Table 5. Correlations between age, sex, and systolic blood pressure were significantly different between the retinal image analysis tools (p < 0.001, 0.001, and 0.026, respectively). This was also observed for CRVE with age, sex, and serum glucose (p < 0.001, 0.002, and 0.028, respectively). Overall, the correlation between CRAE and CRVE with systemic factors was poor (correlation coefficients <0.39).

Discussion
This study aimed to compare the performance of RMHAS and IVAN for automated retinal vessel analysis in a largescale Chinese cohort. The ICC results indicated good agreement between the two software programs. However, further analysis revealed the presence of proportional bias and significant mean differences between measurements for CRAE and CRVE. In addition, some systemic variables showed significant but weak correlations with retinal vessel calibers. To address these issues, a conversion algorithm was developed to improve the interchangeability and eliminate the large mean differences between measurements. Currently population-based studies investigating associations of retinal calibers with other phenotypic variables are being conducted using various retinal vessel analysis software, and concern exists for the interchangeability of their results between studies. This also prevents data pooling and higher-level evidence from systematic reviews and metaanalyses from being considered. Indeed, previous studies have investigated the agreement between IVAN and other semi-automated retinal vessel measurement softwares [7][8][9][22][23][24] with inconsistencies in findings and a majority showing poor agreement. For example, Yip et al. found poor agreement between IVAN and SIVA for all retinal vessel parameters, while Mautuit et al. found good to excellent agreement between these two softwares. 9 This study is the first to assess the agreement between a fully automated deep learning system with semi-automatic IVAN, and found moderate to excellent agreement between them. Although the current findings show promise for the interchangeability of IVAN with RMHAS, a minimum ICC of 0.90 has been proposed for clinical practice interchangeability. Therefore, while the findings herein are superior to other agreement studies, future studies replicating their agreement are necessary to ensure that these findings can be generalized to large-scale meta-analyses and population-based analyses. Moreover, even if this is the case, a better unifying algorithm may need to be proposed if software is to be interchangeable in clinical practice.
Although IVAN is a useful clinical tool for the quantification of retinal morphology, it requires manual input, only analyzes specific retinal regions, and has a limited number of measurement parameters. In contrast, RMHAS can automatically measure the entire fundus in addition to a standard area, making it suitable for high throughput image analysis. For example, IVAN analysis can take >20 min, while RMHAS takes <2 s. 14 In addition to the standard vessel caliber measurements, RMHAS provides additional measurements on tortuosity, length diameter ratio (LDR), junctional exponent deviation (JED), and asymmetry ratio (AR), along with topological information. If RMHAS becomes a popular tool for retinal vessel analysis, it is important for IVAN to be available for pooled analysis.   Therefore, a conversion algorithm was proposed to convert RMHAS and IVAN measurements, which showed good correlation and reduced the mean difference between the software to <1lm for both CRAE and CRVE. This conversion algorithm provides an opportunity for data pooling analyses and should be utilized by future studies. The concordance between SIVA and IVAN for retinalsystemic variables has only once been investigated previously by Yip et al. with conflicting results to the current study. 8 Herein, several significant retinal-systemic variable correlations were found between software programs in this study, whilst Yip et al.'s P values between RA, SIVA and IVAN approached 1 and had stronger correlation trends (R 2 ¼ À0.460 to 0.212) than the current study. 8 Notably, the directions of these associations were also different for sex and lipids. These differences are important, and while Yip et al. used data from the Atherosclerosis Risk In Community study and a circa 1993 Cannon CR-45UAF camera, the current study derived this information from a Topcon TRC-NW6S in a 2022 cohort. It is entirely likely that differences between fundus photos, such as pixel size and resolution of each camera, could affect their segmentation. Future studies should seek to explore these differences between camera types and software algorithms to determine whether a bridging algorithm is needed to ensure that their results are comparable across different types of cameras. Considering the inconsistency between multiple studies addressing retinal-systemic variables, this issue requires prompt addressing to ensure studies are comparable with each other. [25][26][27] This study provides evidence that RMHAS and IVAN have moderate to excellent agreement and supports their interchangeability and generalizability to each other in future and current literature. In addition, an algorithm for converting IVAN and RMHAS measures was established so each could be equilibrated for pooled analysis. Despite these findings, our study has several limitations that should be discussed. First, retinal caliber measurements in RMHAS are based on pixel units while IVAN is based on micron unites. This may introduce some systematic bias, however when comparing the two software we used the pixel-to-micron ratio for the RMHAS values, consistent with IVAN. Second, the participants in this study were mainly healthy adults of Chinese ethnicity, and validation in different disease populations and ethnicities should be conducted to confirm agreeability between software. Third, we used linear regression models to convert the RMHAS and IVAN parameters, and the assumption of normality of the residuals should be satisfied in linear regression models. In our study, our sample size was relatively large, which may explain why we did not observe significant deviations from normality. We acknowledge that other regression methods, such as neural networks, may need to be further considered for larger datasets. Lastly, considering the importance of pixel and micron analysis for IVAN and RMHAS, the effect of different camera modalities is unknown and should be explored in the future to ensure generalizability between ophthalmoscopes.

Conclusion
In conclusion, this study compared IVAN and RMHAS and found moderate agreement for CRAE and AVR, and excellent agreement for CRVE between software types. A conversion algorithm was established to facilitate the interchangeability of IVAN and RMHAS in future large-  scale retinal vessel analyses. Further studies confirming this agreeability and interchangeability in large-scale datasets are needed before softwares are deemed comparable in clinical practice.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The