EARLY DETECTION OF DIABETES USING LOGISTIC REGRESSION: RISK FACTOR ANALYSIS AND PROBABILISTIC PREDICTION
Kata Kunci:
Diabetes Prediction, Logistic Regression, Machine Learning, Hyperparameter Tuning, Risk FactorsAbstrak
Diabetes mellitus is a prevalent chronic disease with significant global health implications, characterized by disruptions in glucose metabolism that can lead to severe complications such as heart disease, kidney failure, and vision impairment. Early detection is critical for effective management and prevention. This study developed a Logistic Regression-based predictive model to identify individuals at high risk of diabetes, utilizing a dataset of 253,680 records encompassing health, lifestyle, and socioeconomic factors. The dataset was preprocessed and split into training (80%) and testing (20%) sets to ensure robust model evaluation. Hyperparameter tuning using Grid Search with Cross-Validation (CV=5) identified the optimal configuration: L2 regularization, liblinear solver, and a regularization strength (C) of 0.01, which enhanced the model's generalization and reduced overfitting. The model achieved strong performance metrics, including accuracy (84.56%), precision (81.60%), recall (84.56%), F1-score (80.68%), and ROC AUC score (81.37%), demonstrating its effectiveness in distinguishing between individuals with and without diabetes. Feature importance analysis highlighted key predictors such as general health, BMI, age, and lifestyle factors, emphasizing the role of both clinical and socioeconomic determinants in diabetes risk. While the model shows promise for clinical application, further refinements to reduce false positives and false negatives are recommended. This study underscores the potential of machine learning in supporting early diabetes detection and risk management, contributing to improved patient outcomes and targeted preventive strategies.