Novel Correlation-based Feature Selection Approach using Manta Ray Foraging Optimization

Document Type : Machine Learning - Monsefi


Computer Science, Shahid Bahonar University of Kerman, Kerman, Iran.


Recent advances in science, engineering, and technology have created massive datasets. As a result, machine learning and data mining techniques cannot perform well on these huge datasets because they contain redundant, noisy, and irrelevant features. The purpose of feature selection is to reduce the dimensionality of datasets by selecting the most relevant attributes while simultaneously increasing classification accuracy. The application of meta-heuristic optimization techniques has become increasingly popular for feature selection in recent years due to their ability to overcome the limitations of traditional optimization methods. This paper presents a binary version of the Manta Ray Foraging Optimizer (MRFO), an alternative optimization algorithm. Besides reducing costs and reducing calculation time, we also incorporated Spearman's correlation coefficient into the proposed method, which we called Correlation Based Binary Manta Ray Foraging (CBBMRF). It eliminates highly positive correlation features at the beginning of the calculation, avoiding additional calculations and leading to faster subset selection. A comparison is made between the presented algorithms and five state-of-the-art meta-heuristics using 10 standard UCI datasets. As a result, the proposed algorithms demonstrate superior performance when solving feature selection problems.


Main Subjects

[1]   A. Adamu, M. Abdullahi, S. B. Junaidu, and I. H. Hassan, “An hybrid particle swarm optimization with crow search algorithm for feature selection”, Machine Learning with Applications, vol. 6, pp. 100108, 2021.
[2]   M. A. Tawhid and K. B. Dsouza, “Hybrid binary bat enhanced particle swarm optimization algorithm for solving feature selection problems”, Applied Computing and Informatics, 2018.
[3]   A. M. Anter and M. Ali, “Feature selection strategy based on hybrid crow search optimization algorithm integrated with chaos theory and fuzzy c-means algorithm for medical diagnosis problems”, Soft Computing, vol. 24, no. 3, pp. 1565-1584, 2020.
[4]   M. Abdel-Basset, W. Ding, and D. El-Shahat, “A hybrid Harris Hawks optimization algorithm with simulated annealing for feature selection”, Artificial Intelligence Review, vol. 54, no. 1, pp. 593-637, 2021.
[5]   L. Abualigah and A. J. Dulaimi, “A novel feature selection method for data mining tasks using hybrid sine cosine algorithm and genetic algorithm”, Cluster Computing, vol. 24, no. 3, pp. 2161-2176, 2021.
[6]   O. Tarkhaneh, T. T. Nguyen, and S. Mazaheri, “A novel wrapper-based feature subset selection method using modified binary differential evolution algorithm”, Information Sciences, vol. 565, pp. 278-305, 2021.
[7]   K. K. Ghosh, S. Ahmed, P. K. Singh, Z. W. Geem, and R. Sarkar, “Improved binary sailfish optimizer based on adaptive β-hill climbing for feature selection”, IEEE Access, vol. 8, pp. 83548-83560, 2020.
[8]   W. Zhao, Z. Zhang, and L. Wang, “Manta ray foraging optimization: An effective bio-inspired optimizer for engineering applications”, Engineering Applications of Artificial Intelligence, vol. 87, pp. 103300, 2020.
[9]   B. H. Nguyen, B. Xue, and M. Zhang, “A survey on swarm intelligence approaches to feature selection in data mining”, Swarm and Evolutionary Computation, vol. 54, pp. 100663, 2020.
[10] D. Jain and V. Singh, “Feature selection and classification systems for chronic disease prediction: A review”, Egyptian Informatics Journal, vol. 19, no. 3, pp. 179-189, 2018.
[11] U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review”, Journal of King Saud University-Computer and Information Sciences, 2019.
[12] M. Rostami, K. Berahmand, E. Nasiri, and S. Forouzandeh, “Review of swarm intelligence-based feature selection methods”, Engineering Applications of Artificial Intelligence, vol. 100, pp. 104210, 2021.
[13] M. Lualdi and M. Fasano, “Statistical analysis of proteomics data: a review on feature selection”, Journal of proteomics, vol. 198, pp. 18-26, 2019.
[14] L. Xie, Z. Li, Y. Zhou, Y. He, and J. Zhu, “Computational diagnostic techniques for electrocardiogram signal analysis”, Sensors, vol. 20, no. 21, pp. 6318, 2020.
[15] R. A. Kumar, J. V. Franklin, and N. Koppula, “A Comprehensive Survey on Metaheuristic Algorithm for Feature Selection Techniques”, Materials Today: Proceedings, 2022.
[16] T. Dokeroglu, A. Deniz, and H. E. Kiziloz, “A Comprehensive Survey on Recent Metaheuristics for Feature Selection”, Neurocomputing, 2022.
[17] S. Kurman and S. Kisan, “An in-depth and contrasting survey of meta-heuristic approaches with classical feature selection techniques specific to cervical cancer”, Knowledge and Information Systems, pp. 1-54, 2023.
[18] R. Yadav, I. Sreedevi, and D. Gupta, “Augmentation in performance and security of WSNs for IoT applications using feature selection and classification techniques”, Alexandria Engineering Journal, vol. 65, pp. 461-473, 2023.
[19] S. M. Ebrahimi and M. J. Hemmati, “Design optimization of the complementary voltage controlled oscillator using a multi-objective gravitational search algorithm”, Evolving Systems, vol. 14, no. 1, pp. 59-67, 2023.
[20] A. Murugan, S. A. H. Nair, and K. Kumar, “Detection of skin cancer using SVM, random forest and kNN classifiers”, Journal of medical systems, vol. 43, no. 8, pp. 1-9, 2019.
[21] K. Hussain, N. Neggaz, W. Zhu, and E. H. Houssein, “An efficient hybrid sine-cosine Harris hawks optimization for low and high-dimensional feature selection”, Expert Systems with Applications, vol. 176, pp. 114778, 2021.
[22] Z. M. Elgamal, N. B. M. Yasin, M. Tubishat, M. Alswaitti, and S. Mirjalili, “An improved harris hawks optimization algorithm with simulated annealing for feature selection in the medical field”, IEEE Access, vol. 8, pp. 186638-186652, 2020.
[23] Y. Ding, K. Zhou, and W. Bi, “Feature selection based on hybridization of genetic algorithm and competitive swarm optimizer”, Soft Computing, vol. 24, no. 15, pp. 11663-11672, 2020.
[24] Y. Zhou, W. Zhang, J. Kang, X. Zhang, and X. Wang, “A problem-specific non-dominated sorting genetic algorithm for supervised feature selection”, Information Sciences, vol. 547, pp. 841-859, 2021.
[25] Y. Liu, X. Zou, S. Ma, M. Avdeev, and S. Shi, “Feature selection method reducing correlations among features by embedding domain knowledge”, Acta Materialia, vol. 238, p. 118195, 2022.
[26] M. Mafarja, A. Qasem, A. A. Heidari, I. Aljarah, H. Faris, and S. Mirjalili, “Efficient hybrid nature-inspired binary optimizers for feature selection”, Cognitive Computation, vol. 12, no. 1, pp. 150-175, 2020.
[27] R. A. Ibrahim, M. Abd Elaziz, A. A. Ewees, M. El-Abd, and S. Lu, “New feature selection paradigm based on hyper-heuristic technique”, Applied Mathematical Modelling, vol. 98, pp. 14-37, 2021.