BREAST CANCER CLASSIFICATION USING MACHINE LEARNING
Kata Kunci:
Breast Cancer, Wisconsin Diagnostic Breast Cancer (WDBC), Support Vector Machine, Random Forest, Hyperparameter Optimization, Statistical ValidationAbstrak
Breast cancer remains one of the most prevalent malignancies, where early and accurate diagnosis is critical to improve patient outcomes. This study investigates the performance of five supervised machine learning algorithms—Support Vector Machine (SVM), Decision Tree (DT), k‑Nearest Neighbors (KNN), Random Forest (RF), and Logistic Regression (LR)—for automated breast cancer classification using the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. The dataset contains 569 samples with 30 numerical features extracted from digitized fine needle aspirate (FNA) images, labeled as benign or malignant. The experimental protocol employs an 80/20 stratified train–test split, feature standardization for scale‑sensitive models, and RandomizedSearchCV with 5‑fold cross‑validation for hyperparameter optimization. Models are evaluated using accuracy, precision, recall (sensitivity), specificity, F1‑score, ROC‑AUC, confusion matrices, and cross‑validation statistics, complemented by approximate 95% confidence intervals and McNemar’s test for pairwise comparison. The optimized SVM with radial basis function kernel achieves test accuracy of 98.25%, precision of 100%, recall of 95.24%, specificity of 100%, and ROC‑AUC of 0.9960, outperforming other models with statistically significant improvements over DT and KNN. Feature importance analysis from tree‑based models highlights “worst” size and shape descriptors (area_worst, perimeter_worst, radius_worst, concave_points_worst) as dominant predictors, aligning with cytopathological understanding of malignant nuclei. The results demonstrate that properly tuned traditional models can provide robust and interpretable performance for tabular medical data, and establish a reproducible baseline for future research in breast cancer classification.



