A New Feature Selection in Email Spam Detection by Particle Swarm Optimization and Fruit Fly Optimization Algorithms

Document Type : Machine Learning - Monsefi

Authors

1 Urmia Branch, Islamic Azad University, Urmia, IRAN

2 Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, IRAN

Abstract

With the advent of the internet, along with email, and social networking, there are some new issues that have caused vulnerability of users against attackers. Internet users face a lot of undesirable emails and their data privacy and security is in danger. Spammers are often sent to users by intruders and sales markets, and most of the time they target spam, harassment, and abuse of user data. With increasing attacks on computer networks, attempts to rebuild computer networks and detect spam emails are important. Hackers use the identities of users by obtaining their personal information and account of users for malicious and subversive actions. Intruders are attempting to expose, remove, or change user information by opening encrypted information. Therefore, it is very important to detect spam in the early stages. In this paper, a new approach is proposed based on a hybridization of Particle Swarm Optimization (PSO) with Fruit Fly Optimization (FFO) to email spam detection. This paper shows a Feature Selection (FS) based on PSO, which decreases dimensionality and improves the accuracy of email spam classification. The PSO searches the feature space for the best feature subsets. Experiments results on the public spambase dataset show that the accuracy of the proposed model is 92.21%, which is better in comparison with others models, such as PSO, Genetic Algorithm (GA), and Ant Colony Optimization (ACO).
 

Keywords


[1] C. Meli, V. Nezval, Z.K. Oplatkova, V. Buttigieg, “Spam detection using linear genetic programming,” 23rd International Conference on Soft Computing, MENDEL 2017: Recent Advances in Soft Computing, Vol. 837, pp. 80-92, 2019.
[2] S. Saha, S. Das, G. Suman, K. Das, “Spam mail detection using data mining: A comparative analysis,” Smart Intelligent Computing and Applications, vol. 104, pp. 571-580, 2019.
[3] F. S. Gharehchopogh, H. Gholizadeh, “A comprehensive survey: Whale Optimization Algorithm and its applications,” Swarm and Evolutionary Computation, Vol. 48, 1-24, 2019
[4] S. B. Rathod, T. M. Pattewar, “Content based spam detection in email using Bayesian classifier”, International Conference on Communications and Signal Processing (ICCSP), pp. 1257-1261, 2015.
[5] I. Idris, A. Selamat, “Improved email spam detection model with negative selection algorithm and particle swarm optimization”, Applied Soft Computing, Vol. 22 pp. 11-27, 2014.
[6] Idris, A. Selamat, S. Omatu, “Hybrid email spam detection model with negative selection algorithm and differential evolution”, Engineering Applications of Artificial Intelligence, Vol. 28, 97-110, 2014.
[7] R. Chikh, S. Chikhi, “Clustered negative selection algorithm and fruit fly optimization for email spam detection,” Journal of Ambient Intelligence and Humanized Computing, Vol. 10, pp. 143-152, 2017.
[8] S. A. Khamis, C. F. M. Foozy, M. F. A. Aziz, N. Rahim, “Header based email spam detection framework using Support Vector Machine (SVM) Technique”, International Conference on Soft Computing and Data Mining SCDM 2020: Recent Advances on Soft Computing and Data Mining, Vol. 978, pp. 57-65, 2020.
[9] S. O. Olatunji, “Improved email spam detection model based on support vector machines,” Neural Computing and Applications, Vol. 31, 691-699, 2019
[10] W. T. Pan, “A new Fruit Fly Optimization Algorithm: Taking the financial distress model as an example”, Knowledge-Based Systems, Vol. 26, pp. 69-74, 2012.
[11] J. Kennedy, R. C. Eberhart, “Particle Swarm Optimization”, In Proceedings of the IEEE International Conference on Neural Networks, Vol. 4, pp. 1942-1948, 1995.
[12] M. Mahmoudi, F. S. Gharehchopogh, “An improvement of shuffled frog leaping algorithm with a decision tree for feature selection in text document classification,” CSI Journal on Computer Science and Engineering, Vol. 16, No. 1, pp. 60-72, 2018
[13] A. Allahverdipour, F. S. Gharehchopogh, “An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification,” Journal of Advances in Computer Research, Vol. 9, No. 2, pp. 37-48, 2018
[14] H. Majidpour, F. S. Gharehchopogh, “An improved flower pollination algorithm with adaboost algorithm for feature selection in text documents classification,” Journal of Advances in Computer Research, Vol. 9, No. 1, pp. 29-40, 2018
[15] S. Ardam, F. S. Gharehchopogh, “Diagnosing liver disease using firefly algorithm based on Adaboost,” Journal of Health Administration, Vol. 22, No. 1, pp. 61-77, 2019.
[16] Spambase Data set: Available online at: https://archive.ics.uci.edu/ml/datasets/spambase, [last available: 2019.02.05]
[17] A. K. Das, S. Das, A. Ghosh, “Ensemble feature selection using bi-objective genetic algorithm”, Knowledge-Based Systems, Vol. 123, pp. 116-127, 2017.
[18] P. Moradi, M. Rostami, “A graph theoretic approach for unsupervised feature selection,” Engineering Applications of Artificial Intelligence, Vol. 44, pp. 33-45, 2014.
[19] L. M. Fernandez, V. B. Canedo, A. A. Betanzos, “Centralized vs. distributed feature selection methods based on data complexity measures”, Knowledge-Based Systems, Vol. 117, pp. 27-45, 2017.
[20] T. Bhadra, S. Bandyopadhyay, “Unsupervised feature selection using an improved version of differential evolution”, Expert Systems with Applications, Vol. 42, No. 8, pp. 4042-4053, 2015.
[21] S. Bandyopadhyay, T. Bhadra, P. Mitra, U. Maulik, “Integration of dense subgraph finding with feature clustering for unsupervised feature selection”, Pattern Recognition Letters, Vol. pp. 104-112, 2014.
[22] P. Moradi, M. Rostami, “Integration of graph clustering with ant colony optimization for feature selection”, Knowledge-Based Systems, Vol. 84, pp. 144-161, 2015.
[23] H. Yan, J. Yang, “Sparse discriminative feature selection,” Pattern Recognition, vol. 48, pp. 1827–1835, 2015.
[24] S. K. Nayak, P. K. Rout, A. K. Jagadev, T. Swarnkar, “Elitism based multi-objective differential evolution for feature selection: A filter approach with an efficient redundancy measure”, Journal of King Saud University-Computer and Information Sciences, Vol. 32, No. 2, pp. 174-187, 2017.
[25] Y. Wan, M. Wang, Z. Ye, X. Lai, “A feature selection method based on modified binary coded ant colony optimization algorithm”, Applied Soft Computing, Vol. 49, pp. 248-258, 2016.
[26] S. Kashef, H. Nezamabadi-pour, “An advanced ACO algorithm for feature subset selection”, Neurocomputing, Vol. 147, pp. 271-279, 2015.
[27] C. T. Su, H. C. Lin, “Applying electromagnetism-like mechanism for Feature Selection”, Information Sciences, Vol. 181, pp. 972-986, 2011.
[28] X. H. Han, X. M. Chang, L. Quan, X. Y. Xiong, J. X. Li, Z. X. Zhang, Y. Liu, “Feature subset selection by gravitational search algorithm optimization”, Information Sciences, Vol. 281, pp. 128-146, 2014.
[29] X. Zhao, W. Deng, Y. Shi, “Feature selection with attributes clustering by maximal information coefficient”, Procedia Computer Science, Vol. 17, pp. 70-79, 2013.
[30] J. Derrac, C. Cornelis, S. Garcia, F. Herrera, “Enhancing evolutionary instance selection algorithms by means of fuzzy rough set-based feature selection”, Information Sciences, Vol. 186, No. 1, pp. 73-92, 2012.