TEKNIK SMOTE DAN GINI SCORE DALAM KLASIFIKASI KANKER PAYUDARA

  • Nur Ghaniaviyanto Ramadhan Institut Teknologi Telkom Purwokerto
  • Faisal Dharma Adhinata Institut Teknologi Telkom Purwokerto
Keywords: Kanker Payudara; Imbalanced Data; Feature Selection; Random Forest, Naïve Bayes

Abstract

Breast cancer is a malignancy in breast tissue that can originate from the epithelium of the ducts and lobules. WHO says 30% - 50% of cancer cases can be prevented. Breast cancer prevention can be done utilizing screening or early diagnosis. The purpose of the initial diagnosis is that if a lump appears, predictions can be made whether it is classified as malignant or benign. Breast cancer prediction can be done using a dataset containing cancer-related parameters. However, sometimes the dataset used also has problems such as the amount of data is not balanced and the use of irrelevant features. This study aims to improve breast cancer prediction results by balancing the number of data classes and using the rank feature. The method used is SMOTE for imbalanced data and Gini score for rank features. The classification model used is random forest and naïve Bayes. The results obtained by the random forest classification model are superior to Naïve Bayes.

Published
2021-12-18
Section
Articles