A COMPARATIVE ANALYSIS OF DEEP LEARNING METHODS FOR SLANG DETECTION IN TWITTER DATA

Muhammad Asim; Muhammad Waqar; Iftikhar Alam

Authors

Muhammad Asim
Muhammad Waqar
Iftikhar Alam

Keywords:

Slang word, Twitter, Deep Learning, Sentiment Analysis

Abstract

Informal communication is expanding online, particularly on social networking sites, such as Twitter. This caused the widespread use of slang, short speech, and non-standard lingo, which makes basic Natural Language Processing (NLP) techniques harder to execute. This study addresses automated slang detection in tweets with state-of-the-art machine learning (ML) and deep learning (DL)-based models, such as Logistic Regression, BERT, ALBERT, Decision Tree, Random Forest, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), DistilBERT, and RoBERTa. The data normalization process is done by cleaning, lemmatization, and tokenization activities to standardize the input. Accuracy, precision, recall, and F1-score are used to evaluate how well a model performed over the conditions. The criterion of measuring performance was AUC-ROC and AUC-ROC. Results showed that RoBERTa was found to be the best among all the models in terms of accuracy, as it achieved. 93.99%, followed closely by DistilBERT (92.63%) and GRU (82.65%), demonstrating the advantage of transformer architectures and sequential models being better at informal language and nigger English tweets. Traditional ML models such as Logistic Regression (78.56%) and Decision Tree (71.13%) showed moderate performance but provided baseline interpretability. The results state that the context-sensitive deep learning models, and particularly transformer-based deep learning models such as RoBERTa and DistilBERT, have garnered great success in terms of slang detection. Tasks. The study establishes baseline, conducts a multi-comparative analysis of several models, and provides a potent perspective on investigating the automated slang category from various angles.