SMS Spam Detection Using Machine Learning Approach
Abstract
Currently, as the popularity of mobile phones has increased, Short Message Service (SMS) has grown tremendously. The minimal cost of messaging services has increased spam or unsolicited messages sent to mobile phones. There are differences between spam filtering for text messages and emails. Emails have a set of big datasets, while the actual databases for SMS spam are very limited. Because of the small size of text messages, the features used for classification are smaller than the equivalent number in emails. Text messages consist of abbreviations and have less formal language than that of emails. Short Message Services (SMS) spam has become a pressing issue in mobile communication, disrupting user experiences and posing privacy threats. This study develops a useful system for identifying spam messages in SMS communications. It presents a machine learning-based framework for detecting SMS spam, utilizing a Multi Layer classifier. This is aimed at tackling the problem of spam messages in SMS communications through the development of a robust and efficient spam detection system. This entails a data preprocessing procedure to prepare the raw SMS dataset. The TF-IDF technique was used to handle feature extraction to represent the text data numerically. This enables the model to capture relevant characteristics distinguishing spam from non-spam messages. The model was trained using the preprocessed data and evaluated through cross-validation. The results highlight the scalability and reliability of this approach, providing a practical solution for enhancing SMS spam detection systems and improving user security in mobile communication, employing the multi-layer classifier for an effective spam detection system, ensuring the models' optimal performance while preventing overfitting to deliver a comprehensive solution to the persistent issue of SMS spam.
Downloads
References
Ahmadi, M., Khajavi, M., Varmaghani, A., Ala, A., Danesh, K. and Javaheri, D. (2025).
Leveraging large language models for cybersecurity: enhancing sms spam detection with robust and context-aware text classification. arXiv preprint arXiv:2502.11014.
Ahmed, N., Amin, R., Aldabbas, H., Koundal, D., Alouffi, B. and Shah, T. (2022). Machine learning techniques for spam detection in email and IoT platforms: analysis and research Networks, 2022(1), 1862888. challenges.
Airlangga, G. (2024). Optimizing SMS spam detection using machine learning: A comparative analysis of ensemble and traditional classifiers. Journal of Computer Networks, Architecture and High Performance Computing, 6(4).
Al-Kabbi, H. A., Feizi-Derakhshi, M. R. and Pashazadeh, S. (2023). Multi-type feature extraction and early fusion framework for sms spam detection. IEEE Access, 11, 123756-123765. Communication
Al-shanableh, N., Alzyoud, M. S. and Nashnush, E. (2024). Enhancing email spam detection through ensemble machine learning: A comprehensive evaluation of model integration and performance. Communications of the IIMA, 22(1), 2.
Biglia, N., Maggiorotto, F., Liberale, V., Bounous, V. E., Sgro, L. G., Pecchio, S. and Ponzone, R. (2013). Clinical-pathologic features, long term-outcome and surgical treatment in a large series of patients with invasive lobular carcinoma (ILC) and invasive ductal carcinoma (IDC). European Journal of Surgical Oncology (EJSO), 39(5), 455-460.
Bharadiya, J. P. (2023). Artificial intelligence in transportation systems a critical review. American Journal of Computing and Engineering, 6(1), 35-45.
De Goma, J., Bravo, J. A., Prudente, S. and Rondilla, R. F. (2024). Detection of SMS Spam Messages Using TF-IDF Vectorizer and Deep Learning Models. In Proceedings of the 2024 9th International Conference on Intelligent Information Technology, 245-249.
Ejirika, E. R. and Omotehinwa, T. O. (2024). Analysis of Machine Learning Models for Spam Email Detection and Real-Time Integration. In 2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG), 1-10, IEEE.25 International STEM Journal, Volume 6 No.1, June 2025, 10-27.
Gadde, S., Lakshmanarao, A. and Satyanarayana, S. (2021). SMS spam detection using machine learning and deep learning techniques. In 2021 7th international conference on advanced computing and communication systems (ICACCS), Vol.
, 358-362, IEEE.
Johari, M. F., Chiew, K. L., Hosen, A. R., Yong, K. S., Khan, A. S., Abbasi, I. A. and Grzonka, D. (2025). Key insights into recommended SMS spam detection datasets. Scientific Reports, 15 (1), 8162.
Hadi, M. T. and Baawi, S. S. (2024). Email Spam Detection by Machine Learning Approaches: A Review. In International Conference on Forthcoming Networks and Sustainability in the AIoT Era, 186-204. Cham: Springer Nature Switzerland.
Kalolo, C. and Mbelwa, J. (2023). Comparative Analysis of Machine Learning Models for Detecting Mobile Messaging Spam In Swahili SMS. In 2023 First International Conference on the Advancements of Artificial Intelligence in African Context (AAIAC),1-7, IEEE.
Kalyani, V. V., Sundari, M. R., Neelima, S., Prasad, P. S. S., Mohan, P. P. and Lakshmanarao, A. (2024). SMS Spam Detection using NLP and Deep Learning Recurrent Neural Network Variants. In 2024 International Conference on Cognitive Robotics and Intelligent Systems (ICC-ROBINS), 92-96, IEEE.
Land Jr, W. H., Schaffer, J. D., Land, W. H. and Schaffer, J. D. (2020). The support vector machine. The Art and Science of Machine Intelligence: With An Innovative Application for Alzheimer’s Detection from Speech, 45-76.
Mambina, I. S., Ndibwile, J. D., Uwimpuhwe, D. and Michael, K. F. (2024). Uncovering SMS spam in Swahili text using deep learning approaches, 12, 25164-25175, IEEE Access.
Naeem, M. Z., Rustam, F., Mehmood, A., Ashraf, I. and Choi, G. S. (2022). Classification of movie reviews using term frequency-inverse document frequency and optimized machine learning algorithms. PeerJ Computer Science, 8, 914.
Oyeyemi, D. A. and Ojo, A. K. (2024). SMS Spam Detection and Classification to Combat Abuse in Telephone Networks Using Natural Language Processing. arXiv preprint arXiv:2406.06578.
Patel, D., Saxena, S., Verma, T. and Student, P. G. (2016). Sentiment analysis using maximum entropy algorithm in big data. International Journal of Innovative Research in Science, Engineering and Technology, 5(5): 8355-8361.