Machine Learning-Based Risk Prediction Model for Leprosy-Related Disability Among Leprosy Patients: A 39-Year Observed Cohort Study
Objective
To develop and validate interpretable machine learning (ML) models for risk prediction of disability in leprae-infected individuals.
Patients and Methods
Data were gathered from leprosy patients admitted between 1985 and 2023 in Wenshan Prefecture, China through the Leprosy Management Information System during the study period from January 1, 2024 to December 31, 2024. The dataset comprised 2504 patients for the training set and 1073 patients for the test set. Nine ML techniques constructed predictive models. Each model was assessed using various evaluation metrics including the area under the receiver operating characteristic curve (AUC). Additionally, the SHapley Additive exPlanation (SHAP) technique ranked the feature importance.
Results
The random forest (RF) model exhibited the highest performance. It accurately predicted the risk of leprosy-related disability in both the internal validation of the training set (AUC 0·924, 95% CI 0·913–0·936) and external test set (AUC 0·699, 95% CI 0·663–0·734). The calibration curve demonstrated strong agreement between the predicted and observed risks. The SHAP analysis identified the 15 key variables: delayed days of treatment, detection mode, age, region of residence, socioeconomic status, bacterial index, marital status, infection source, policy periods at leprosy diagnosis, clinical form, treatment regimen, leprosy relapse, highest level of education, status of infected family members, and treatment location.
Conclusion
The 39-year follow-up cohort data provided a validated interpretable RF model to predict disability risk in people with leprosy. This can help clinicians early identify high-risk patients and provide individualized intervention programs.