Machine Learning Model Predicts 10-Year Cataract Surgery Risk

Eye Surgeon and Nurse performing Eye Surgery
The research shows the software is “marginally superior” to conventional logistic models.

Machine learning modeling can predict with reasonable accuracy the 10-year risk of cataract surgery based on a patient’s self-reported questionnaire responses, according to research results published in the British Journal of Ophthalmology.

Researchers sought to evaluate the performance of multiple machine learning algorithms, compared with a conventional logistic model, to predict the need for cataract surgery based on a prospective, population-based Australian study with 10 years of follow-up data, the 45 and Up Study. 

The 45 and Up Study is a large-scale, prospective cohort study including adults aged 45 and older from New South Wales, Australia, who were randomly sampled from the general population. Participants completed a self-administered questionnaire at baseline; records were then linked across Australian health databases. 

In the current study, researchers utilized 1 traditional regression model and 3 “state-of-the-art” machine learning models to predict cataract surgery risk. Relative performance of all 4 models was compared. The data set was split 60-40 into training and validation cohorts, with models tuned via 10-fold cross-validation. 

The total cohort included 207,573 participants who were eligible for the final analysis. Spanning a median 9-year follow-up period, 11.4% of eligible participants had linked Medicare Benefits Schedule claims for cataract surgery. 

Cataract surgery incidence steadily increased with age: 5.31% of participants between 45 and 64 years and 23.84% of participants aged 65 or older at baseline required cataract surgery during the follow-up period. The higher cumulative incidence of cataract surgery with increasing age was seen regardless of gender, although women had a higher cumulative incidence in both age groups during follow-up. 

All 4 predictive models demonstrate “reasonably high predictive accuracy” for cataract surgery. The gradient boosting machine (GBM) in particular achieved an area under the curve (AUC) of 0.790, followed by the random forest (RF; AUC, 0.785), deep learning (DL; AUC, 0.781, and logistic regression (AUC, 0.767). The 3 machine learning models demonstrated statistically significant superior performance compared with the conventional logistic regression model. 

All models demonstrated consistently that age was the most important predictor for cataract surgery, accounting for 30% to 95% of the variance in the deep learning and logistic regression models alone. Health insurance was determined to be the second most important predictor in the deep learning, RF, and logistic regression models. 

External validation of the prediction models achieved similar results, with AUCs of 0.768, 0.786, 0.790, and 0.782 for the logistic, RF, GBM, and DL models, respectively. 

Study limitations include the potentially biased definition of incident cataract surgery used in the study, and a lack of evaluation of the “expected applications of [machine learning] models.

“[W]e have applied [machine learning methods and established high accurate risk prediction models for cataract surgery, based on non-clinical self-reported questionnaires,” the researchers conclude. “Future applications of [machine learning]-based prediction models using large datasets can facilitate powerful disease prediction tools and inform public health change at both individual and governmental levels.” 


Wang W, Han X, Zhang J, et al. Predicting the 10-year risk of cataract surgery using machine learning techniques on questionnaire data: findings from the 45 and Up Study. Br J Ophthalmol. Published online May 26, 2021. doi: 10.1136/bjophthalmol-2020-318609