Objective
Diseases such as age-related macular degeneration (AMD) are classified based on human rubrics that are prone to bias. Supervised neural networks trained using human-generated labels require labor-intensive annotations and are restricted to the specific trained tasks. Here, we trained a self-supervised deep learning network using unlabeled fundus images, enabling data-driven feature classification of AMD severity and discovery of ocular phenotypes.
Design
Development of a self-supervised training pipeline to enable grading of AMD severity using fundus photographs from the Age-Related Eye Disease Study (AREDS).
Subjects
100,848 human-graded fundus images from 4,757 AREDS participants between 55-80 years of age.
Methods
We trained a deep neural network with self-supervised Non-Parametric Instance Discrimination (NPID) using AREDS fundus images without labels, then evaluated its performance in grading AMD severity using 2-step, 4-step, and 9-step classification schemes using a supervised classifier. We compared balanced and unbalanced accuracies of NPID against supervised-trained networks and ophthalmologists, explored network behavior using hierarchical learning of image subsets and spherical k-means clustering of feature vectors, then searched for ocular features that can be identified without labels.
Main Outcome Measures
Accuracy and kappa statistics
Results
NPID demonstrated versatility across different AMD classification schemes without re-training, and achieved balanced accuracies comparable to supervised-trained networks or human ophthalmologists in classifying advanced AMD (82\% vs. 81-92\% or 89\%), referable AMD (87\% vs. 90-92\% or 96\%), or on the 4-step AMD severity scale (65\% vs. 63-75\% or 67\%), despite never directly using these labels during self-supervised feature learning. Drusen area drove network predictions on the 4-step scale, while depigmentation and geographic atrophy (GA) areas correlated with advanced AMD classes. Self-supervised learning revealed grader-mislabeled images and susceptibility of some classes within the more granular 9-step AMD scale to misclassification by both ophthalmologists and neural networks. Importantly, self-supervised learning enabled data-driven discovery of AMD features such as GA and other ocular phenotypes of the choroid (e.g. tessellated or blonde fundi), vitreous (e.g. asteroid hyalosis), and lens (e.g. nuclear cataracts) that were not pre-defined by human labels.
Conclusions
Self-supervised learning enables AMD severity grading comparable to ophthalmologists and supervised networks, reveals biases of human-defined AMD classification systems, and allows unbiased, data-driven discovery of AMD and non-AMD ocular phenotypes.