Using the Machine Learning Algorithms for Accurate Prediction of Diabetes

Authors

  • Emmanuel Imuede Oyasor
  • Adedeji Daniel Gbadebo Walter Sisulu University, Mthatha, Eastern Cape, South Africa

DOI:

https://doi.org/10.33022/ijcs.v13i6.4488

Keywords:

Diabetes Prediction, Machine learning, Support Vector Machine, AdaBoost, Neural Networks, K-Nearest Neighbors, Random Forest, Logit Boost

Abstract

Diabetics has proven to be the most threatening illness affecting the body system. It is associated with many consequences, including blindness, kidney failure, amputations, heart failure, microvascular and macrovascular complications, which affects millions of people across the world and has contributed to increased mortality. Studies shows that effective management and early detection of diabetes remains crucial for preventing its complications and improving the patient. According to available data, we use machine learning algorithms, including the Support Vector Machine (SVM), AdaBoost (ADA), Neural Networks (NNET), K-Nearest Neighbors (KNN), Random Forest (RF), and Logit Boost (LOGIT), for the accurate prediction of diabetes amongst patients. We find that the Logit Boost and AdaBoost stand out as the top performers for predicting diabetic patients, with balanced and reliable performance across various evaluation metrics. They exhibit high accuracy, strong AUC scores, and good overall performance across multiple metrics, making them suitable for this classification task. Neural Networks show excellent precision and low log loss, indicating strong probabilistic predictions, but their lower specificity suggests a higher false-positive rate. Random Forest demonstrates good recall but lower accuracy on the test set, indicating potential overfitting to the training data. SVM and KNN perform the weakest across most metrics, suggesting they may not be the best choices for this prediction task.

Downloads

Published

30-12-2024