“A HYBRID NAIVE BAYES–SBERT ENSEMBLE FOR ROBUST SMS SPAM AND PHISHING DETECTION”
Keywords:
ensemble, Naïve Bayes, SMS SpamAbstract
It is the case that people communicating through text messaging have caused a big increase in the number of mobile users who can be targeted by spammers and phishers. A lot of unwanted text messages also bring about difficulty in communication and at the same time, they constitute a major risk to the user's privacy and the trust in digital communication systems. Consequently, it is a very big challenge for researchers to come up with very precise and trustworthy detection mechanisms, in conjunction with the development of such mechanisms being a very difficult task. The present paper reveals a machine learning framework that integrates the use of many technologies including both classical and modern for the efficient detection of spam and phishing attacks. The method brings together the Naive Bayes (NB) classifier that is based on lexical features extracted from the message content with Sentence-BERT (SBERT) embeddings that access deeper semantic and contextual information within the messages. While taking advantage of both models, a strategy based on probabilistic averaging is used to strengthen the overall classification's robustness. The combined dataset of 9,614 English SMS and e-mails was applied to the testing of the experiment where 20% of the data was reserved for testing. The proposed ensemble model produced an impressive 96.72% accuracy as well as 95.13% precision, 87.83% recall, and 91.33% F1-score for spam and phishing detection. The increase in performance is still a lot higher than what individual classifiers can achieve, hence hybrid ensemble learning has been shown as a powerful technique in not only detecting surface-level lexical patterns but also in understanding deeper semantic context. The findings show that hybrid models have been a strong solution in the battle against detection systems for the mobile use case.













