CLASSIFICATION SVM (SUPPORT VECTOR MACHINE) AND COMBINATION SELECTION OF FEATURES IN BREAST CANCER DIAGNOSIS

ABSTRACT: of breast cancer has been widely implemented using machine learning. However, in medical data analysis, breast cancer diagnosis is usually faced with high dimensional features. The high dimensional features sometimes contain irrelevant features toward the classification process. Feature selection is a method to eliminate irrelevant features. It can improve the performance of diagnosis. The objective of this research is to develop a feature selection method for breast cancer diagnosis based on rough set and F-score combination. Performance of combination features selection was applied in Wisconsin Breast Cancer Dataset (WBCD). F-score feature selection method and Rough set are combined subsequently by applied Rough set firstly. Than the result of reduced subset feature by Rough set will be selected with F-score feature selection method. Improvement the performance of diagnosis would be evaluated based on the average of sensitivity, ROC AUC, accuracy, and running time with 100 times experiment. Furthermore, the results would be compared with the performance of feature selection method when it is applied individually and simultaneously. The result shows that the combination of F-score feature selection method and rough set achieves the optimal feature and superior performance compared with F-score and Rough set when applied individually. The obtained of sensitivity 0.9714, ROC AUC 0.9700, accuracy 97.05%, and the running time 0.0722 s. Keywords : Feature selection, Rough set, F-score, SVM classification, Breast cancer diagnosis