SMOTE-ENN-LR: LEVERAGING MACHINE LEARNING FOR BREAST CANCER CLASSIFICATION IN MICROARRAY GENE EXPRESSION WITH EXPLAINABLE AI

Md Faisal Bin  Abdul Aziz; Azree  Nazri; Fatematuz Zuhura  Evamoni; Razali  Yaakob; Teh Noranis Mohd  Aris; Zamberi  Sekawi; Tanjim  Mahmud; Olalekan  Agbolade; Wajid  Syed; Mohamed N Al  Arifi

doi:10.22452/mjcs.vol38no2.4

Authors

Md Faisal Bin Abdul Aziz Department of Computer Science, Universiti Putra Malaysia, Serdang, Malaysia
Azree Nazri Department of Computer Science, Universiti Putra Malaysia, Serdang, Malaysia Corresponding Author
Fatematuz Zuhura Evamoni Dept. of Biotechnology and Genetic Engineering, Noakhali Science and Technology University, Bangladesh
Razali Yaakob Department of Computer Science, Universiti Putra Malaysia, Serdang, Malaysia
Teh Noranis Mohd Aris Department of Computer Science, Universiti Putra Malaysia, Serdang, Malaysia
Zamberi Sekawi Department of Medical Microbiology, Universiti Putra Malaysia, Serdang, Malaysia
Tanjim Mahmud Department of Computer Science and Engineering, Rangamati Science and Technology University, Bangladesh
Olalekan Agbolade Department of Computer Science, Universiti Putra Malaysia, Serdang, Malaysia
Wajid Syed Department of Clinical Pharmacy, College of Pharmacy, King Saud University, Saudi Arabia
Mohamed N Al Arifi Department of Clinical Pharmacy, College of Pharmacy, King Saud University, Saudi Arabia

DOI:

https://doi.org/10.22452/mjcs.vol38no2.4

Keywords:

Breast cancer, Gene expression, Machine learning, Logistic Regression, Classification, Explainable AI

Abstract

Breast cancer continues to be a major public health issue worldwide, ranking as the second leading cause of cancer-related deaths among women. Effective early detection and classification are crucial for improving survival rates, yet they are complicated by the challenges posed by imbalanced datasets in microarray gene expression analysis. These imbalances can significantly affect the predictive power and reliability of traditional classification models, underscoring the need for more sophisticated analytical techniques. This study introduces an approach, the SMOTE-ENN-LR method, which combines the Synthetic Minority Over-sampling Technique (SMOTE) with Edited Nearest Neighbors (ENN) for noise removal and Logistic Regression (LR) to accurately classify breast cancer based on microarray data. The SMOTE technique is utilized to over-sample the minority cases in the dataset, thereby addressing the issue of underrepresentation. Simultaneously, the ENN method is employed to clean the data by removing mislabeled instances and noise, which are often prevalent in over-sampled datasets. The cleaned and stable dataset is used to train a LR model, optimizing its ability to discern between cancerous (Abnormal) and non-cancerous (Normal) gene expression profiles effectively. Our comprehensive evaluation shows that the SMOTE-ENN-LR method attained a remarkable classification accuracy of 97.14%, outperforming contemporary state-of-the-art methods. This significant enhancement in accuracy highlights the potential of combining advanced data preprocessing techniques with robust statistical learning models to tackle the inherent challenges of microarray data analysis. Further, we employ Local Interpretable Model-agnostic Explanations (LIME) and SHAP (SHapley Additive exPlanations) to offer an understandings into our model’s decision-making process, enhancing the predictions’ transparency and interpretability. Moreover, the success of the SMOTE-ENN-LR method in this study paves the way for its application in other areas of medical diagnostics where similar data imbalances may impact the accuracy and effectiveness of disease classification. These results substantiate the effectiveness of the SMOTE-ENN-LR approach in managing the complexities of imbalanced microarray gene expression data, proposing a promising path for upcoming research in medical bioinformatics and precision medicine.

SMOTE-ENN-LR: LEVERAGING MACHINE LEARNING FOR BREAST CANCER CLASSIFICATION IN MICROARRAY GENE EXPRESSION WITH EXPLAINABLE AI

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

Most read articles by the same author(s)

Editorial Information

Scope

Submission Guidelines

Indexing

Article Publication Charge

Journal Template

Special Issue

In Press Publication

Awards

Information

Conference

Articles

Top Cited Articles

Most View Articles

Publishing Timeline