Introduction

Fundus photography is an integral aspect in the diagnosis, documentation, decision-making and follow-up of diabetic retinopathy. The Early Treatment of Diabetic Retinopathy Study (ETDRS) utilized 7-standard field 35-mm colour 30° fundus images (ETDRS 7-field images) [1]. This has long been considered the gold standard for the evaluation of diabetic retinopathy (DR) severity in clinical research [1]. The ETDRS 7-fields covered a total of approximately 90° or 30% of the total retinal surface. However, the image acquisition requires trained photographers, cooperation from the subjects, and is time consuming. For the aforementioned reasons, it is impractical to use the ETDRS 7-field photography in a clinical setting.

Recent advances in imaging system such as ultra-wide-field (UWF) photography allow for a better visualization of the peripheral retinal lesions. The UWF retinal imaging system covers a maximum of 200° of the retina or 80% of the total retinal surface in a single image, without the need for mydriasis majority of the time, with a resolution of 14 μm and an acquisition time of 0.25 s. This imaging is accomplished using scanning laser ophthalmoscope technology combined with the unique optical properties of an ellipsoidal mirror. The digital images can be stored easily, retrieved instantaneously, and is a reliable method of documentation for patient follow-up examinations. Consequently, reports have described the use of nonmydriatic UWF imaging in the evaluation of DR in comparison to the ETDRS 7-field images, and the two systems showed substantial agreement [2,3,4]. However, the UWF photography covers larger areas of the retina traditionally not visualized by the ETDRS fields, allowing for a simultaneous evaluation of both the posterior pole and the retinal peripheral in a single image. This enabled better characterization and improved visualization with regards to the presence and distribution of DR lesions across a larger area of the retina. Indeed, the UWF system detected 40% of eyes with DR lesions outside of the 7-fields [4] and about 10% of eyes were graded to have a higher level of DR severity when utilizing the UWF system [5]. The use of UWF thus altered the classification of DR [6, 7].

Although many reports have described the utility of nonmydriatic UWF imaging in the evaluation of DR, there are no dedicated studies to date that examined the inter-observer agreement on grading DR severity using the Optos UWF imaging system. The current study intended to examine the inter-observer agreement for grading DR on ultra-wide-field fundus photography.

This assessment is essential in clinically differentiating stages of DR at a tertiary eye care centre. This task will also ensure uniformity in the decision-making process, follow-up plan and treatment of patients with DR. The study aimed to assess the inter-observer agreement in grading DR severity on Optos UWF fundus photographs between two graders of equal experience, rather than the diagnostic capability of retina specialists compared to gold standard for staging of DR.

Methods and materials

The study was conducted at a tertiary eye hospital in Chennai, south India, after obtaining ethics approval from the Institutional Review Board of the Vision Research Foundation, Chennai, India (Study approval no. 642-2017-P). The study was conducted in accordance with the Declaration of Helsinki. The current study is a part of a larger longitudinal study that examines individuals with diabetes with no DR and with specific focus in those with mild and moderate non-proliferative (NPDR). The participants in the larger study are being followed-up annually for four years to identify early ophthalmic measures that can predict future development or worsening of DR. Therefore, the focus was on patients with diabetes with no DR and early stages of NPDR.

Consecutive patients with known diabetes who visited our hospital from March 2018 till December 2019 were screened for eligibility. Adults with type 2 diabetes with minimum one-year duration of diabetes underwent comprehensive eye examination that included visual acuity testing, refraction, slit lamp examination, intraocular pressure assessment, pupillary dilatation and retinal evaluation by a vitreoretinal surgeon as a part of routine eye examination and ultra-wide-field fundus photography on Optos. Patient management and referral were based on clinical evaluation and retinal findings on dilated indirect ophthalmoscopy by a single surgeon (RRN) rather than Optos findings. Individuals were excluded if they had coexisting ocular infection or inflammation, spherical refractive error greater than ± 6 D, astigmatism greater than ± 3D, suspicion or confirmed diagnosis of glaucoma, ocular hypertension, those who had undergone or planned for vitreoretinal surgery in at least one eye, retinal vascular diseases other than DR, those with cataract that precluded fundus examination and those participating in any interventional research trial. Eligible participants who provided written informed consent were included in the study.

Ultra-wide-field (UWF) Digital Imaging (Daytona plus, Optos Inc, MA, USA) was utilized for ultra-wide-field view fundus photography. Red-green images of the central, superior, inferior, nasal and temporal fields and autofluorescence fundus images of the central field were captured as a part of our institutional protocol. Frequently, a one-shot view capturing a retinal image centred on the macula has limited views or field in the inferior quadrants fields as they are frequently obscured by eyebrows, artefacts, eyelashes and eyelids [8]. All fields (except autofluorescence images) were presented to the graders for a better view and additional clarification.

Diabetic retinopathy grading on Optos UWF photographs

De-identified Optos UWF retinal photographs along with coded patient ID on spread sheet were provided for recording the grading. DR grading was done independently, by two retina specialists with the same number (5 years) of years of experience in the vitreoretina specialty. Each grader received images from a total of 270 patients stored in a folder with limited access, along with a copy of the proposed international classification of diabetic retinopathy, with no identifiers to the medical records of patients. The graders were asked to read the images in order from image number 1 to number 270 under standard ambient room illumination. Graders used a 19-inch DELL liquid crystal display computer monitor with a screen resolution set at 1440 × 900 pixels, 8-bit RGB colour depth and a refresh rate of 59-Hz. Graders could magnify the images, but not the brightness or contrast [9]. Grading was recorded in a Microsoft Excel (2007) (Microsoft Corporation, Redmond, WA) spread sheet provided per grader, protected with password. The graders were asked to grade 25 patients per three weeks so as to avoid observer fatigue. There were no limits set on the numbers to be graded per day. Data were analyzed using the SPSS 25.0 for Windows (SPSS Inc., Chicago, IL, USA).

The Optos UWF fundus photographs were graded based on the International Clinical Diabetic Retinopathy Disease Severity Scale [10] as follows: No apparent DR- No abnormalities, mild NPDR- microaneurysms only, moderate NPDR -more than just microaneurysms but less than severe NPDR; Severe NPDR-any of the following: more than 20 intraretinal haemorrhages in each of the four quadrants, definite venous beading in 2+ quadrants, prominent IRMA in 1+ quadrant, and no signs of proliferative retinopathy; Proliferative DR-0 either neovascularization or vitreous/preretinal haemorrhage or both.

In this study, the International Clinical Diabetic Retinopathy (ICDR) severity scale was used for grading DR. The scale was originally developed for use with ETDRS seven-field images and it applies the Early Treatment of Diabetic Retinopathy Study 4:2:1 rule based on scientific evidence. An ICDR diagnosis of ‘no apparent retinopathy’ corresponds to ETDRS Levels 10 and 14; the ICDR diagnosis of ‘mild NPDR’ corresponds to ETDRS Level 20 ‘micoraneurysms only’; ICDR diagnosis of ‘moderate NPDR’ corresponds to Levels 35, 43, 47 of ETDRS; ICDR diagnosis of severe NPDR corresponds to ETDRS Levels 53-A-53E, severe NDPR and very severe NPDR; ICDR diagnosis of PDR corresponds to ETDRS Levels 61, 65, 71, 75, 81 and 85 of DR. In our study, an ETDRS mask was not used on the UWF images. Instead, the UWF image was assessed as a whole and the ICDR grading scale was applied for the entire image retaining the original grading system for simplicity [11].

Sample size estimation

The inter-observer agreement with respect to diagnostic accuracy between two graders was assessed using the Kappa (k) statistic [9, 12, 13]. A k < 0 was be considered as ‘no agreement’; k = 0.0–0.19, poor; k = 0.20–0.39, fair; k = 0.40 to 0.59, moderate; and k = 0.60 to 0.79, substantial; and K = 0.80–1.0, almost perfect agreement. The sample size estimation was based on the k statistic model for 2 raters. For an expected value of k at 0.7, prevalence of retinopathy in diabetic patients to be 0.2 (20%) [14, 15], significance level at 0.05 and power at 0.8, no less than 268 images (rounded to 270) was estimated to be required. From the larger sample of 290 patient images from 290 individuals with diabetes of their baseline visit, a random sample of 270 images was chosen for the current study.

Results

The mean age of the participants was 60.5 ± 8.4 years, (range = 39–87 years); the mean diabetes duration was 11.9 years (range = 1–40 years). 53% were females. The cross tabulation between the graders is shown in Table 1. Agreement between graders was observed in 229 out of 270 images (84.8%) and disagreement was noted in 41 out of 270 images (15.2%). Unweighted kappa was, k = 0.715, SE = 0.037 and weighted kappa, k = 0.838, SE = 0.022. No DR was identified in 170/270 (62.9%) of the images by both Graders, mild NPDR in 15/270 (5.6%) images, moderate NPDR in 35/270 (12.9%), severe NPDR in 4/270 (1.48%) and PDR in 5/270 (1.85%).

Table 1 Cross tabulation for grading DR severity between two graders.

With regards to disagreement, Grader 1 identified no DR in an additional 11 (4.1%) patients while Grader 2 identified no DR in another 4 (1.48%) patients. Grader 1 identified mild NPDR in 11 (4.1%) patients while Grader 2 identified mild NPDR in another 12 (4.4%) patients. Grader 1 identified moderate NPDR in 18 (6.6%) patients, while Grader 2 identified moderate NPDR in 9 (3.3%) patients. Grader 1 identified severe NPDR in 1 (0.3%) while Grader 2 identified severe NPDR in 15 (5.5%) patients. Grader 1 identified PDR in no additional 0 (0.00%) participants while Grader 2 identified PDR in 1 (0.3%) additional participant.

Of the 11 ‘No DR’ identified by Grader 1, 10 were graded as mild NPDR and 1 as moderate NPDR by Grader 2. Of the 11 additionally identified as having mild NPDR by Grader 1, Grader 2 identified 4 as No DR and 7 as moderate NPDR. Of the additional 18 identified as moderate NPDR by Grader 1, Grader 2 identified 2 as mild NPDR, 15 as severe NPDR and 1 as PDR. On the other hand, of the 5 patients identified as severe NPDR by Grader 1, Grader 2 identified 4 as severe NPDR and 1 as moderate NPDR.

In order to understand the patient-related factors for disagreement in grading, a univariate binary logistic regression was performed with disagreement versus agreement as the outcome variable (Table 2). Disagreement was not related to patient’s age (OR = 1.014, 95% CI: 0.966, 1.064, p = 0.574), female gender (OR = 0.512, 95% CI: 0.197, 1.331, p = 0.169), duration of diabetes (OR = 0.987, 95% CI: 0.931, 1.046, p = 0.660) or the lens being phakic or pseudophakic (OR = 1.67, 95% CI: 0.802, 3.480, p = 0.171).

Table 2 Patient-related factors: univariate logistic regression: agreement (n = 229) vs disagreement (n = 41).

To understand the grader-related factors for disagreement in grading, the images were then split into three equal groups in the order of grading. The kappa between graders was k = 0.727 for the first set, k = 0.750 for the second set, and k = 0.641 for the third set. There was no evidence of an increasing trend seen thus excluding a learning curve effect (Table 3).

Table 3 Grader-related factors: agreement between two graders.

Discussion

The study examined the inter-observer agreement for grading the severity of DR using the Optos UWF retinal imaging system. Since the Optos UWF is reported to have a low resolution for the detection of small retinal lesions [16], it was anticipated that there could be discrepancies between graders in identifying microaneurysms in an UWF retinal photography compared to other lesions of DR. Nevertheless, when comparing Grader 1 versus Grader 2, differences were observed in the proportions of no DR (4.1 vs. 1.48%), mild NPDR (4.1 vs. 4.4%), moderate NPDR (6.6 vs. 3.3%), severe NPDR (0.3 vs. 5.5%), and PDR (0.00 vs. 0.3%) and in differentiating no DR from mild DR and between moderate and severe NPDR.

On a general note, overestimating the severity of DR as moderate or severe may only lead to further confirmatory investigations and/or eye examination earlier than required. Any underestimation of a potentially referable DR would be a problem but only (i) in a clinical setting (ii) when utilizing Optos for clinical advice and referral of patients. In the current study, patients underwent dilated retinal examination with an indirect ophthalmoscopy by a single surgeon (RRN) and patient advice and referral was based on clinical findings and not on Optos findings. Nonetheless, the study finding that there are inter-observer differences in grading DR on Optos is an important factor to account for in research studies when utilizing more than one grader.

Additionally, it was observed that the inter-observer differences in grading were not related to patient-factors or due to the learning curve of the graders. Therefore, some alternate explanations may be considered as below: The fundus image acquired in an Optos UWF is formed by the combination of monochromatic red and green scanning laser scans i.e. pseudocolours derived from the combination of red and green lasers. The green laser channel scans the sensory retina to the retinal pigment epithelium while the red laser channel scans the deeper structures of the retina, from the pigment epithelium to the choroid. The pseudocolour thus is different from a real colour image and may affect the evaluation of DR severity [17]. In addition, the presence of peripheral distortion, decreased resolution of the far temporal and nasal peripheral retina [18], and the presence of artefacts due to eyelashes, may interfere with the clarity of imaging and interpretation.

The early treatment diabetic retinopathy study (ETDRS) fundus photography protocol is still regarded as the gold standard for identifying DR in clinical trials [19]. We used the international clinical DR (ICDR) severity scale as it is a simplified grading and has been used in many reports on DR [11, 17, 19, 20]. Recent studies [9, 17,18,19,20] have utilized the ICDR grading scale along with the ETDRS grading system for comparing Optos UWF photography with that of Clarus [17] and Topcon [19]. It was reported that the Optos system may be a viable option for assessments requiring wider retinal imaging ranges, such as when using the ICDR scale. As a result, the severity of DR would be expected to be greater in Optos images when using the ICDR scale because of the wider field captured.

Price et al. [12] compared the ETDRS seven-field view and the wider UWF view and showed that 19% of images had discrepancy, with 15% showing greater severity of DR. Other studies observed that 41% to two-thirds of images had peripheral lesions outside of the ETDRS seven fields [2, 4]. In our study, of the 270 images, disagreement between graders was observed in 41 images (15.2%). It is likely that some images in those 15% may have had peripheral lesions. Lesions in the periphery may not be as sharply in focus as the lesions in the posterior pole due to the presence of peripheral distortion and decreased resolution in the far temporal and nasal peripheral retina [18]. This could be one of the several explanations for the discrepancy between the two graders in our study.

The Optos UWF imaging system was chosen in the larger study because of the wider field of view it offers when compared to other retinal photographic modalities [21, 22]. In our study, Optos UWF images were captured after the patient underwent dilated retinal examination by an ophthalmologist to rule out other retinal conditions. Although ultra-wide-field imaging can cover about 180 to 200 degrees of the retina without the need for mydriasis in most situations, it may still suffer from inherent distortion due to optics and colour variation due to scanning laser. In addition, spherical aberration of the ellipsoidal mirror and the spherical curvature of the eye in the periphery may result in less sharply focused areas [23]. Mydriatic images acquired using the Optos UWF are reported to have better quality than images taken under no mydriasis [24] perhaps by decreasing some of the limitations of the nonmydriatic UWF photography [4].

The current study is an institution-based grading by retina specialists who are already effective interpreters and do not require any additional training as compared to general ophthalmologists or physicians who would require additional comprehensive training [7]. The current study is a part of a larger longitudinal study that examines individuals with diabetes with no DR and NPDR, who are being followed-up annually for four years to elucidate early ophthalmic measures that may predict development or worsening of DR. Therefore, the focus was on no DR and early stages of NPDR. This led to a smaller sample of patients with severe NPDR and PDR. The skewed distribution of DR in the study could be the reason for observing (falsely) high kappa values and should be interpreted with caution. Future studies with comparable sample sizes for all categories of DR may be required.

Although a few recent studies [11, 17, 19, 20] have utilized the ICDR grading system on UWF images, the clinical data are still limited. The identification of discrepancies between graders in grading DR on UWF photographs suggests the need for additional validation. In addition, examining the diagnostic capability of retina specialists in comparison to that of a senior vitreoretinal surgeon on Optos and indirect ophthalmoscopy may be valuable.

In conclusion, although an UWF retinal imaging system provides larger retinal coverage there appears to disagreement between observers in identifying various grades of DR. This needs to be taken into consideration when utilizing more than one grader in clinical studies. Further studies in larger cohorts of individuals with comparable sample sizes in the severe NPDR and PDR categories may also be required.

Summary

What was known before

  • Many reports have described the utility of nonmydriatic UWF imaging in the evaluation of DR. There appears to be no dedicated study that examined the inter-observer agreement on grading DR severity using the Optos UWF imaging system

What this study adds

  • We examined if there are any differences between graders in detecting some of the earliest lesions in DR on the Optos UWF imaging system

  • This study specifically focused on patients with no diabetic retinopathy (DR) and early stages of DR and observed that differences between two graders exist in grading no DR as well as for other stages of DR.