TY - JOUR
T1 - Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy
AU - Cubillos, Gabriel
AU - Monckeberg, Max
AU - Plaza, Alejandra
AU - Morgan, Maria
AU - Estevez, Pablo A.
AU - Choolani, Mahesh
AU - Kemp, Matthew W.
AU - Illanes, Sebastian E.
AU - Perez, Claudio A.
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023/12
Y1 - 2023/12
N2 - Background: Early prediction of Gestational Diabetes Mellitus (GDM) risk is of particular importance as it may enable more efficacious interventions and reduce cumulative injury to mother and fetus. The aim of this study is to develop machine learning (ML) models, for the early prediction of GDM using widely available variables, facilitating early intervention, and making possible to apply the prediction models in places where there is no access to more complex examinations. Methods: The dataset used in this study includes registries from 1,611 pregnancies. Twelve different ML models and their hyperparameters were optimized to achieve early and high prediction performance of GDM. A data augmentation method was used in training to improve prediction results. Three methods were used to select the most relevant variables for GDM prediction. After training, the models ranked with the highest Area under the Receiver Operating Characteristic Curve (AUCROC), were assessed on the validation set. Models with the best results were assessed in the test set as a measure of generalization performance. Results: Our method allows identifying many possible models for various levels of sensitivity and specificity. Four models achieved a high sensitivity of 0.82, a specificity in the range 0.72–0.74, accuracy between 0.73–0.75, and AUCROC of 0.81. These models required between 7 and 12 input variables. Another possible choice could be a model with sensitivity of 0.89 that requires just 5 variables reaching an accuracy of 0.65, a specificity of 0.62, and AUCROC of 0.82. Conclusions: The principal findings of our study are: Early prediction of GDM within early stages of pregnancy using regular examinations/exams; the development and optimization of twelve different ML models and their hyperparameters to achieve the highest prediction performance; a novel data augmentation method is proposed to allow reaching excellent GDM prediction results with various models.
AB - Background: Early prediction of Gestational Diabetes Mellitus (GDM) risk is of particular importance as it may enable more efficacious interventions and reduce cumulative injury to mother and fetus. The aim of this study is to develop machine learning (ML) models, for the early prediction of GDM using widely available variables, facilitating early intervention, and making possible to apply the prediction models in places where there is no access to more complex examinations. Methods: The dataset used in this study includes registries from 1,611 pregnancies. Twelve different ML models and their hyperparameters were optimized to achieve early and high prediction performance of GDM. A data augmentation method was used in training to improve prediction results. Three methods were used to select the most relevant variables for GDM prediction. After training, the models ranked with the highest Area under the Receiver Operating Characteristic Curve (AUCROC), were assessed on the validation set. Models with the best results were assessed in the test set as a measure of generalization performance. Results: Our method allows identifying many possible models for various levels of sensitivity and specificity. Four models achieved a high sensitivity of 0.82, a specificity in the range 0.72–0.74, accuracy between 0.73–0.75, and AUCROC of 0.81. These models required between 7 and 12 input variables. Another possible choice could be a model with sensitivity of 0.89 that requires just 5 variables reaching an accuracy of 0.65, a specificity of 0.62, and AUCROC of 0.82. Conclusions: The principal findings of our study are: Early prediction of GDM within early stages of pregnancy using regular examinations/exams; the development and optimization of twelve different ML models and their hyperparameters to achieve the highest prediction performance; a novel data augmentation method is proposed to allow reaching excellent GDM prediction results with various models.
KW - Data augmentation
KW - GDM risk prediction
KW - Gestational diabetes mellitus (GDM)
KW - Machine learning models
KW - Widely available variables
UR - http://www.scopus.com/inward/record.url?scp=85162762606&partnerID=8YFLogxK
U2 - 10.1186/s12884-023-05766-4
DO - 10.1186/s12884-023-05766-4
M3 - Article
C2 - 37353749
AN - SCOPUS:85162762606
SN - 1471-2393
VL - 23
SP - 1
EP - 18
JO - BMC Pregnancy and Childbirth
JF - BMC Pregnancy and Childbirth
IS - 1
M1 - 469
ER -