Paper 01

A novel approach for phishing detection using NLP

University of Alabama | 2025

Phishing emails continue to pose an immediate and modern threat to global cybersecurity. This paper investigates the effectiveness of various natural language processing and machine learning techniques for detecting phishing emails. Three NLP techniques (n-grams, bag-of-words, and term frequency-inverse document frequency) are evaluated across three machine learning models: logistic regression, random forest, and support vector machine. We apply 18 unique model-vectorization combinations to a dataset of almost 82,500 emails and evaluate them on accuracy, precision, recall, and F1 score. The strongest configuration combines TF-IDF with a (1,2) n-gram range and an SVM, reaching 99.19% accuracy and reducing error rate by 38.17% compared with the second-best setup. Index Terms - cybersecurity, detection, feature extraction, logistic regression, machine learning, natural language processing, phishing email, random forest, spam, support vector machines

A novel approach for phishing detection using NLP

Rapid Literature Review of Reinforcement Learning and Large Language Model Techniques for Software Engineering Testing and Bug Detection