TY - JOUR KW - Risk factors KW - Disability KW - Neglected Diseases KW - machine learning KW - Epidemiology KW - leprosy AU - Freitas LRSD AU - Freitas JAOD AU - Penna GO AU - Duarte EC AB - The severity of physical disability at leprosy diagnosis reflects the timeliness of case detection and the effectiveness of disease surveillance. This study evaluates machine learning models to predict factors associated with late leprosy diagnosis—defined as grade 2 physical disability (G2D)—in Brazil from 2018 to 2022. Using an observational cross-sectional design, we analyzed data from the Notifiable Diseases Information System and trained four machine learning models: Random Forest, LightGBM, CatBoost, XGBoost, and an Ensemble model. Model performance was assessed through accuracy, area under the receiver operating characteristic curve (AUC-ROC), recall, precision, F1 score, specificity, and Matthew’s correlation coefficient (MCC). An increasing trend in G2D prevalence was observed, averaging 11.6% over the study period and rising to 13.1% in 2022. The Ensemble model and LightGBM demonstrated the highest predictive performance, particularly in the north and northeast regions (accuracy: 0.85, AUC-ROC: 0.93, recall: 0.90, F1 score: 0.83, MCC: 0.70), with similar results in other regions. Key predictors of G2D included the number of nerves affected, clinical form, education level, and case detection mode. These findings underscore the potential of machine learning to enhance early detection strategies and reduce the burden of disability in leprosy, particularly in regions with persistent health disparities. BT - Tropical Medicine and Infectious Disease DO - 10.3390/tropicalmed10050131 IS - 5 LA - eng M3 - Research Article N2 - The severity of physical disability at leprosy diagnosis reflects the timeliness of case detection and the effectiveness of disease surveillance. This study evaluates machine learning models to predict factors associated with late leprosy diagnosis—defined as grade 2 physical disability (G2D)—in Brazil from 2018 to 2022. Using an observational cross-sectional design, we analyzed data from the Notifiable Diseases Information System and trained four machine learning models: Random Forest, LightGBM, CatBoost, XGBoost, and an Ensemble model. Model performance was assessed through accuracy, area under the receiver operating characteristic curve (AUC-ROC), recall, precision, F1 score, specificity, and Matthew’s correlation coefficient (MCC). An increasing trend in G2D prevalence was observed, averaging 11.6% over the study period and rising to 13.1% in 2022. The Ensemble model and LightGBM demonstrated the highest predictive performance, particularly in the north and northeast regions (accuracy: 0.85, AUC-ROC: 0.93, recall: 0.90, F1 score: 0.83, MCC: 0.70), with similar results in other regions. Key predictors of G2D included the number of nerves affected, clinical form, education level, and case detection mode. These findings underscore the potential of machine learning to enhance early detection strategies and reduce the burden of disability in leprosy, particularly in regions with persistent health disparities. PB - MDPI AG PY - 2025 EP - 131 T2 - Tropical Medicine and Infectious Disease TI - Evaluating Machine Learning Models for Predicting Late Leprosy Diagnosis by Physical Disability Grade in Brazil (2018–2022) UR - https://www.mdpi.com/2414-6366/10/5/131 VL - 10 SN - 2414-6366 ER -