Comparison of Photographic Screening Methods for Diabetic Retinopathy – A Meta-analysis

ABSTRACT Introduction Diabetic Retinopathy (DR) is a leading cause of irreversible visual impairment and blindness in both developed and developing countries. Although the merits of DR screening are well recognized, significant variations in screening practices including imaging modality still exists. Purpose To evaluate and compare the sensitivity and specificity of mydriatic and non-mydriatic photographic screening methods using 7-Field fundus photography or dilated fundus examination (DFE) by an ophthalmologist as reference standard. Methods A systematic review using PRISMA Guidelines was conducted by online search of MEDLINE, Web of Science, and other repositories of all available studies from 1990 until 2019. A total of 62 studies were included in the meta-analysis from a total of 406 suitable abstracts screened and 95 articles reviewed in full. Data were collected using a standardized extraction form independently, with all authors masked to others’ search results. Results For the detection of any DR (ADR), sensitivity ranged from 81% with single field to a maximum of 99% for 4–7 fields and wide-angle images. For detection of referable DR (RDR) sensitivity ranged from 76% for single field to 93% for wide-angle photography. Specificity was lowest at 91% for wide-angle images and greatest at 99% for three field photography. Study heterogeneity was noted to be significant, which was partly attributed to the range of DR classification between studies. Conclusions The sensitivity and specificity of DR screening are positively associated with number of photographic fields. Pooled estimates suggest non-mydriatic two-field photography may be sufficient for screening detection of ADR and RDR.


Background
Diabetes mellitus (DM) is a leading cause of irreversible visual impairment and blindness worldwide. 1,2 The current worldwide prevalence of DM is projected to double to 366 million (4.8%) before the year 2030. 3 Current evidence-based diabetes eye care guidelines are highly effective at preserving vision and preventing vision loss from diabetic retinopathy (DR). As diabetic eye disease is largely asymptomatic in its most treatable stages, regular eye examinations of every person with DM are needed.
Diabetic eye examinations typically include assessment using Early Treatment Diabetic Retinopathy Study (ETDRS) 7 Field 30-degree fundus photography or clinical dilated fundus examination (DFE). These are considered clinical gold-standard examinations for assessing DR. 4,5 ETDRS photography typically requires trained photographers or technicians, while fundoscopy requires attendance by an experienced ophthalmologist and both require pharmacologic pupil dilation. A number of studies, however, have suggested that less resource intensive alternatives might be similarly effective which is particularly relevant for resource-poor settings. Against this background we appraised the existing evidence surrounding screening instruments and other operational parameters such as field number, size and mydriatic status with the aim to compare and evaluate the screening accuracy of different photography techniques compared to 7-Field ETDRS or DFE by an ophthalmologist as a reference standard for the detection of DR.

Method
The study protocol was registered on PROSPERO (http://www.crd.york.ac.uk/PROSPERO; registration number CRD42015024142) prior to commencement.
Studies examining the accuracy of any DR photographic screening method among patients with Type 1 or 2 DM or DR, in which photography was compared with either 7-Field ETDRS or DFE assessed by a consultant ophthalmologist or equivalent specialist. All participants in included studies were required to have the reference examination in addition to those who were screen-positive and followed-up. Studies evaluating automated diagnosis or algorithm techniques were excluded from the final analysis, as were the studies without available complete texts, or where inadequate data were available for the calculation of specificity and sensitivity values.  Systematic Reviews, and the Rural and Remote Health Database (RURAL) of all available studies across the period from 1990 to 2019. A flowchart of the search strategy and outcomes is presented in Figure 1. The search was conducted in accordance with PRISMA Statement guidelines, which is a validated tool for ensuring transparent and complete reporting of systematic reviews and meta-analyses. 6 The search strategy comprised keywords for Diabetic Retinopathy, DRspecific screening terms and generic terms as outlined in Supplementary Table S1. The search was initially carried out independently by WY, RC and KF. All three authors were masked to the results of each others' searches. The titles and abstracts of the pooled search were then subsequently reviewed by WY, and complete texts were obtained for those meeting the inclusion criteria. Full texts were reviewed independently by WY, MM, RC and KF to determine further suitability for inclusion. At each of the review steps, any concern or disagreements regarding the suitability of a publication for inclusion were resolved by consensus or from consulting RF, an experienced ophthalmologist.

Search strategy
References of included papers were further screened by WY and assessed for eligibility to obtain the final list of included studies. Reviewers were not masked to any of the study characteristics.

Inclusion and exclusion criteria
All published studies up to and including November 2019 evaluating the diagnostic accuracy of DR screening methods and with English full-texts were considered for inclusion. The primary focus was to compare diagnostic accuracy of different photography techniques (pupil dilation, number of fields, size of fields) rather than the cadres of photographer and grader. The reference standard diagnosis of DR was obtained using DFE or 7-field ETDRS photography as assessed by a consultant ophthalmologist or equivalent grader. As a rule, the reference examination was applied to the entire study group of included studies to accurately calculate sensitivity and specificity values using complete cell counts or where characteristics of the excluded participants were clearly stated. Participants with and without DR must have been included in the screening population. Studies were excluded from analysis for the following reasons: (1) Insufficient data to calculate sensitivity and specificity or confidence intervals (CIs) after attempts to contact authors; (2) Case-control or two-gate methodology as study design; (3) Video images without fundus photography assessed as primary screening method; (4) Automated diagnosis algorithms assessed as primary screening method; (5) Optical coherence tomography assessed as primary screening method; (6) If results reported were averaged over multiple assessors. Studies were not excluded on the basis of qualifications of screening image assessor, or subject age, race or diabetes status (Type 1 vs. Type 2).

Data collection
Data were collected by WY, MM, RC and KF using a standardized extraction form in an independent fashion and all authors were masked to each other's search results. The extraction form included the following items: study year, study location, mydriatic status, screening instrument, size and number of fields, format (digital vs. colour), reference standard (7-Field ETDRS vs. DFE), and experience of grader (ophthalmologist, trained grader). Study authors were contacted if necessary data for calculation of sensitivity and specificity was omitted in papers considered otherwise suitable for inclusion. The units of assessment (per person, per eye) and level of DR detected (any DR, referable DR) were also noted. The number of true positives, false positives, true negatives and false negatives were calculated for each level of DR. Ungradable cases were excluded to reduce heterogeneity between studies as not all studies included information on the numbers of ungradable cases. Where multiple estimates from one study were available for any single meta-analysis, the priority for inclusion was given to the following characteristics: non-mydriatic images (vs. use of mydriatic agents); colour images (rather than red free); digital image capture (vs. film); non-portable devices; gold standard of seven field ETDRS images (vs. ophthalmoscopy); steered wide-angle images; and analysis per eye (vs. per person).

Diabetic retinopathy classification
The presence of any DR was defined as mild nonproliferative DR (NPDR) or greater, including microaneurysms only, corresponding to a ETDRS level of 20 or greater using the Modified Airlie House Classification. 7 Referable DR (RDR) was defined as the presence of moderate non-proliferative DR (NPDR) (i.e. haemorrhages or microaneurysms in at least one quadrant, together with one or more findings of cotton wool spots, venous beading, or intraretinal microvascular abnormalities) or greater, corresponding to a ETDRS DR score of ≥ 43. This definition was selected as the 'referable' standard for DR in several studies included 'moderate to severe' NPDR, and these patients all received further ophthalmic assessment. DR disease assessment based on solitary findings of macular oedema were not considered.

Assessment of bias
The QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) tool provides a transparent approach to appraising the risk of bias and applicability of studies that are included in a systematic review evaluating diagnostic accuracy. 8 The QUADAS-2 characteristics are shown in Supplementary Table S2 and represent an overall low-risk of bias in patient selection, choice of index test, choice of reference standard, study flow and patient selection for the final list of included studies.

Statistical analysis
Estimates of sensitivity, specificity, diagnostic odds ratio and the positive and negative likelihood ratio were calculated to assess diagnostic accuracy of screening techniques. Sensitivities and specificities with CI were calculated from study data to two decimal places. Any cell count of zero was replaced by 0.5 before estimates were produced. Estimates were calculated using random-effects meta-analysis and pooled according to diagnosis threshold (Any DR, RDR) and the number of screening images assessed per eye (categorised as 1, 2, 3 or 4-7). The rationale for dividing the analysis into these four categories was based on the total numbers of eligible studies that included single, 2, 3 and 4 or more field techniques. The majority of eligible studies were of 1 or 2-field technique. A paucity of studies of four or more fields were combined to explore further trends in the data for increasing number of fields and sensitivity and specificity. Wide-angle images (200 degrees) were assessed separately. The I 2 statistic was calculated to quantify statistical heterogeneity between studies. Deeks' funnel plot asymmetry test was used to investigate small sample effects. 9 In this test, the inverse of the square root of the effective sample size, considered a measure of variance, was plotted against the logarithm of the diagnostic odds ratio. A test of the slope of the corresponding regression line was further performed. 9 Univariate random effects meta-regression was used to investigate statistical heterogeneity among studies of either one or two image fields. The impact of the following characteristics on sensitivity and specificity were assessed with in each category of diagnosis threshold and number of images: size of image field (20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)45 or 50-60 degrees); use of mydriatic agents; year of publication (in approximate quartile groupings); location of participants categorised as Europe (UK, Denmark, France, Germany, Italy, Spain, Sweden and the Netherlands), North America (USA, Canada and Barbados), Asia (China, India, Thailand, Taiwan and Hong Kong), Middle East (Egypt, Israel and Lebanon) and Australia; digital or film image capture; method of gold standard assessment; and diagnosis per eye or per person. All available estimates from each study were included (excluding wide angle images), and robust variance estimation was used to account for the correlated non-independent estimates within each study. 10 Statistical analysis was conducted from using Stata/IC 13.1 software (College Station, TX, USA).

Any diabetic retinopathy
For images less than 200 degrees in diameter (field width measured from centre of eye), there was a trend of increasing sensitivity with the number of fields captured although specificity remained fairly stable across number of fields (Table 1). For detection of any DR (ADR), sensitivity ranged from 81% with single field (95% CI 74-87%) to a maximum of 98% and 99% for 4-7 fields and wide-angle images (95% CI 93-99%, 94-100% respectively). Specificity was greatest using 3-field photography for the detection of ADR (95%, 95% CI 91-98%) and lowest for wide-angle images (79%, 95% CI 63-89%). The diagnostic odds ratio demonstrated good performance for all categories and, improving with increasing number of fields graded. The meta-analysis estimate of positive likelihood ratio was 4.6 for wide angle images and greater than 9.2 for all other categories, suggesting a positive-screen result is highly indicative of disease. The negative likelihood ratio showed greater performance as the number of fields increased and was less than 0.2 for screening using four or more fields or wide-angle imaging. There was a high degree of heterogeneity as measured by I 2 for each meta-analysis (85-99%, see Table 1). This was attributed to a greater degree of variability between studies than within studies. Most studies had small within study variance. The metaanalyses of one, two and three fields had sufficient studies to investigate small study effects through examination of funnel plots. No substantial funnel plot asymmetry was discovered (Supplementary Figures  S1-S5).

Referable diabetic retinopathy
There was a high degree of heterogeneity for each metaanalysis as estimated by the I 2 statistic (77-99%, see Table 2), which is partially attributed to a small withinstudy variance in most studies. There was an overall trend of increasing sensitivity with increasing number of fields, with the exception of studies using three fields where sensitivity was estimated to be 78% (95% CI 63-88%, Table 2) compared with 85% and 92% for two fields and 4-7 fields, respectively. Furthermore, three field was found to be approximately equal to the sensitivity of single field photography (78% vs. 76%, 95% CI 69-82%). Sensitivity was greatest for 4-7 fields (92%, 95% CI 86-96%) and wide-angle photography (93%, 95% CI 87-96%). Specificity was greatest for RDR using 3 field photography (99%, 95% CI 95-100%) and lowest for wide-angle images (91%, 95% CI 84-95%). For each meta-analyses (one-field, two-fields, etc.) specificity was generally high (greater than 90%) with a large diagnostic odds ratio that increased with the number of fields assessed. The positive likelihood ratio was greater than 10 for all meta-analyses. There was some asymmetry noted for the funnel plot in the metaanalysis of screening with four to seven fields (Supplementary Figure S9).

Meta-regression
Newer publications reported higher sensitivity when assessing any DR but not when assessing referable DR.
Other characteristics (region of study, image size, mydriasis, photography medium) did not appear to have a consistent impact on specificity (Supplementary  Tables S5-S8). Meta-regression analyses by change in I 2 value for the detection of ADR using single or two fields less than 200 degrees revealed no differences in sensitivity or specificity were explained by differences in region of study (Europe, North America, Asia, Middle East, Southern Hemisphere), year of study (1992-2001, 2002-2006, 2007-2009, 2010-2017), mydriatic status (Non-mydriatic vs. Mydriatic) and medium of photography (Digital vs. Film). For the detection of RDR using a single image field less than 200 degrees, some variance in sensitivity between studies was explained by the difference in image size (45 degrees vs. 50-60 degrees), and type of gold standard examination (7 F ETDRS vs. DFE) as measured by a decrease in the I 2 value. For the detection of RDR using two image fields less than 200 degrees, study variance in sensitivity was explained by differences in mydriatic status (non-mydriatic vs. mydriatic) and gold standard examination (7 F ETDRS vs. DFE), and a degree variance in specificity was attributed to studies completed in the time period from 2007 to 2009.

Diagnostic likelihood ratios
For the diagnosis of ADR, all screening techniques with the exception of wide-angle images had a positive likelihood ratio (LR+) greater than 9.2. For screening with wide angle images, the LR+ was 4.6 implying that a positive screening result moderately increases the likelihood of having ADR. There was a trend towards greater performance in negative likelihood ratios (LR-) with increasing number of fields. For screening with four or more fields or wide-angle images, the LR-was 0.02. For the diagnosis of RDR, all screening techniques including wide-angle images had a LR+ greater than 10. As with ADR, the LR-for RDR was greatest for screening with four or more fields or wide-angle images; the LR-for these were 0.08 and 0.07, respectively, implying that a very strong correlation between a negative screen result and the absence of disease.

Discussion
The present study demonstrated an association between the number of photographic fields and sensitivity of DR photography screening with a high specificity for almost all techniques assessed. The British Diabetic Association (Diabetes UK) recommends that all DR screening programs achieve minimum sensitivities and specificities of 80% and 95% respectively, although other reports in the literature have suggested a minimum sensitivity of 60% is sufficient. 11,12 Based on these cut-offs, two-field photography centred on the optic disc and macula without pupil dilation is sufficiently accurate for the detection of ADR and SDR. Whether single image wide-angle photography might outperform this remains to be seen at present. Although one would expect a positive association between number and size of photographed images and diagnostic accuracy, the results of this study present additional data for health planners. The pooled estimates from this meta-analysis of published DR photography studies suggest that two-field photography may be sufficient to satisfy both the sensitivity and specificity criteria (80% and 95%, respectively) of a DR screening program for detecting ADR and RDR. Specificity was greatest using three-field photography for the detection of ADR and lowest for wide-angle images. For RDR, specificity was greatest for three-field photography and lowest for wide-angle images. Specificity for detection of ADR was poor overall with no screening technique found to satisfy the 95% specificity screening requirement. Conversely, the specificity for detection of RDR was excellent with all techniques except wide-angle imaging having specificities of greater than 95%.
As diabetic eye changes are typically asymptomatic in their early stages, patients with DR may not notice subtle vision changes and therefore seek treatment in a timely manner. 13 Previous research investigating barriers to attending DR screening has highlighted that lack of awareness or poor education regarding diabetic eye care and complications, access to specialist services, aversion to being judged for poor glycaemic control, fear of ophthalmological treatment and denial of primary diabetes diagnosis are major barriers and concerns for diabetic patients. [14][15][16][17][18][19][20] Although the merits of DR screening are well-recognized, there is a significant variation in screening methods (i.e. conventional fundus examination vs. digital colour fundus photography vs. fluorescein angiography) and an overall lack of uniformity in ophthalmoscopy technique and imaging modalities used for DR screening. 21 It is estimated that in the USA alone, a universal DR screening program adhering to current guidelines would generate over 32 million images for grading, per year. 22 Recent improvements in the accuracy of automated DR detection presents 'store-and-forward' artificial intelligence (AI) screening as a feasible alternative to reduce the cost and specialist burden of DR screening in underprivileged settings. 23 Although this model in the future may remove variability and workload at the image grading end, it is necessary to understand the optimum quality and number of images that are required from patients to minimize onsite resource and time expenditures.
With regards to the strength of the positive diagnostic likelihood ratios found in this study, it is worth noting that excluding wide-angle images, a positive screening result for all other screening techniques was effectively conclusive for the presence of any DR. For screening with more than four images, or including wide-angle images, the strength of the negative diagnostic likelihood ratios implies that a negative result is strongly suggestive of disease absence. In regard to the reduced positive diagnostic ratios for wide-angle images, the results of our study are in keeping with previous research demonstrating wide-field imaging (> 200degrees) tends to 'over-diagnose' DR severity in approximately 20% of cases, particularly for the detection of peripheral retinal lesions, when compared with reference standard 7-field ETDRS image grading. 24,25 Similarly, the strength of positive diagnostic likelihood ratios for the detection of referable DR, including wide angle images, also suggests that a positive screening result was effectively conclusive for the presence of DR. This presents considerations for how positive-screened DR cases should be referred or followed-up, that is, whether they should be followed-up by a qualified eye care provider or directly to a retinal specialist on the probability that they have disease and where treatments such as timely laser photocoagulation could be initiated on-site or during the same consultation.
The findings of comparatively lower specificity for wide-field imaging, can be explained by the capacity to capture areas of peripheral retina outside of the standard ETDRS 7 fields. Therefore, the detection of peripheral ischaemia and lesions prompts a greater severity diagnosis of DR compared to reference standards, resulting in a lower specificity. 26 The significance of this difference is being investigated as the correlation between predominantly peripheral retinal lesions and ischaemia with greater risk of progression towards proliferative DR and diabetic macular oedema. 24,27 Another possible explanation for the lower specificity found in widefield imaging studies is that there is a negative bias for evaluating this technique within cohorts of known diabetic subjects compared to in a primary screening context, as was the case for included studies.

Strengths and limitations
To the best of our knowledge, the present study provides the most comprehensive systematic review of published DR screening studies in the literature to date. Strengths of this study included the thorough and systematic search procedure, inclusion criteria and data extraction of DR grades of severity in addition to the presence or absence of ADR. 28 Screen-negative participants in all included studies all underwent gold-standard reference examination to provide accurate sensitivity and specificity data, and where these values were not quoted in the full-text, values were manually calculated using pooled data tables with written requests to authors for supplementary data as necessary. However, there are important study limitations which must be acknowledged. Firstly, the significant heterogeneity of included studies as indicated by I 2 statistic can be attributed to varying DR classification schemes (ADR vs. ETDRS Level) as well as generalised changes to DR screening practices over the study period, although for instance no study differences were found between digital and film photography modalities in the meta-regression analysis. This study did not however, include or assess for the presence of diabetic macular oedema specifically. Secondly, cases of ungradable images were excluded for sensitivity and specificity calculations though in practice one would assume that these be treated as positive cases in a screening program. A high proportion of participants with ungradable images without DR would result in an overestimation of the specificity in this review. Furthermore, studies primarily evaluating the accuracy of screening using video or automated detection techniques, or solely assessing diabetic macular oedema were excluded from analysis as the primary outcome was to evaluate effectiveness of photography methods and detection of DR, respectively. The monitor image resolution was not controlled for in the analysis which may be a possible confounder within the analysis. Finally, some sources of heterogeneity may be strongly associated with differences in screening setting, making between-study comparisons less reliable than within-study direct comparisons. Screening is a complex intervention and using photographic field technique to assess heterogeneity may conceal other sources affecting between-study comparisons, such as technician qualification and experience, technician preferences, quality of the camera lenses and ambient light during image acquisition. However, based on study selection, review methodology, estimates of heterogeneity and employed metaregressions significant bias in our results is unlikely.

Conclusion
The sensitivity and specificity of photographic DR screening appears to be positively associated with number of photographic fields. Mydriatic status and photography medium did not affect sensitivity and specificity in meta-regression analyses. Furthermore, this review may guide healthcare planners in scoping the most effective photographic parameters for implementing a DR population screening service.

Acknowledgments
CERA receives Operational Infrastructure Support from the Victorian Government.
Dr. William Yan and Dr. Myra McGuinness are the guarantors of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The author(s) reported there is no funding associated with the work featured in this article.