A diagnostic system for detecting COVID-19 patients depending on lexicon semantic and Biterm Topic Model-Based Feature Selection on WhatsApp Messages Classification

Document Type : Original Article

Authors

¹ Ministry of Education, General Directorate for Education in Al-Qadisiyah, Iraq.

² University of Al-Qadisiyah, College of Agriculture

10.22067/cke.2025.91490.1145

Abstract

COVID-19 has created an urgent need for innovative detection methods. This study presents a novel approach to identifying potential COVID-19 patients by analyzing their WhatsApp messages using advanced natural language processing techniques. Our methodology combines Word2Vec embeddings with lexical-semantic enrichment using ConceptNet, creating a comprehensive system that can detect subtle linguistic patterns associated with COVID-19 symptoms and experiences. The system processes WhatsApp messages through multiple stages: initial data collection, Word2Vec embedding, lexicon semantic enhancement, vector-space model creation, Biterm Topic Model-based feature selection, and finally, Naive Bayes classification. By enriching the language model with synonyms and capturing complex semantic relationships, our approach can identify potential COVID-19 cases based on how people describe their symptoms and experiences in everyday conversations. We tested the system on a sample of diverse WhatsApp messages, achieving promising results in distinguishing between messages from COVID-19 patients and healthy individuals. The system successfully identified both explicit statements of COVID-19 status and more subtle descriptions of symptoms, while correctly classifying non-COVID related messages with high confidence. While this method shows potential as a non-invasive and scalable screening tool, it should be viewed as complementary to existing diagnostic approaches rather than a replacement. Further large-scale testing is needed to fully validate the system's reliability and effectiveness in real-world applications.

Keywords

Main Subjects

Computer Science

References

[1] H. Jelodar, Y. Wang, R. Orji, and S. Huang. (2020, Jun. 9). Deep sentiment classification and topic discovery on novel coronavirus or COVID-19 online discussions: NLP using LSTM recurrent neural network approach. IEEE Journal of Biomedical and Health Informatics. [Online]. 24(10), pp. 2733–2742. Available: https://doi.org/10.1109/JBHI.2020.3001216

[2] J. Samuel, G. M. Ali, M. M. Rahman, E. Esawi, and Y. Samuel. (2020, Jun.). Covid-19 public sentiment insights and machine learning for tweets classification. Information. [Online]. 11(6), p. 314. Available: https://doi.org/10.3390/info11060314

[3] O. Gencoglu. (2020, Nov.). Large-scale, language-agnostic discourse classification of tweets during COVID-19. Machine Learning and Knowledge Extraction. [Online]. 2(4), pp. 603–616. Available: https://doi.org/10.3390/make2040032

[4] O. Oyebode, C. Ndulue, D. Mulchandani, B. Suruliraj, A. Adib, F. A. Orji, E. Milios, S. Matwin, and R. Orji. (2022, Jun.). COVID-19 pandemic: identifying key issues using social media and natural language processing. Journal of Healthcare Informatics Research. [Online]. 6(2), pp. 174–207. Available:

https://doi.org/10.1007/s41666-021-00111-w

[5] Q. Chen, R. Leaman, A. Allot, L. Luo, C. H. Wei, S. Yan, and Z. Lu. (2021, Jul.). Artificial intelligence in action: addressing the COVID-19 pandemic with natural language processing. Annual Review of Biomedical Data Science. [Online]. 4(1), pp. 313–339. Available: https://doi.org/10.1146/annurev-biodatasci-021821-061045

[6] V. Kocaman and D. Talby. (2020, Dec.). Improving clinical document understanding on COVID-19 research with spark NLP. arXiv preprint. [Online]. Available: https://doi.org/10.48550/arXiv.2012.04005

[7] N. Nasser, L. Karim, A. El Ouadrhiri, A. Ali, and N. Khan. (2021, Sep.). n-Gram based language processing using Twitter dataset to identify COVID-19 patients. Sustainable Cities and Society. [Online]. 72, p. 103048. Available: https://doi.org/10.1016/j.scs.2021.103048

[8] S. K. Pathuri, N. Anbazhagan, G. P. Joshi, and J. You. (2021, Dec.). Feature-based sentimental analysis on public attention towards COVID-19 using CUDA-SADBM classification model. Sensors. [Online]. 22(1), p. 80. Available: https://doi.org/10.3390/s22010080

[9] H. Grissette and E. H. Nfaoui. (2022, Jan.). Affective concept-based encoding of patient narratives via sentic computing and neural networks. Cognitive Computation. [Online]. 14(1), pp. 274–299. Available: https://doi.org/10.1007/s12559-021-09903-z

[10] Z. Jalil, A. Abbasi, A. R. Javed, M. B. Khan, M. H. Abul Hasanat, K. M. Malik, and A. K. Saudagar. (2022, Jan.). COVID-19 related sentiment analysis using state-of-the-art machine learning and deep learning techniques. Frontiers in Public Health. [Online]. 9, p. 812735. Available: https://doi.org/10.3389/fpubh.2021.812735

[11] M. Raihan, M. M. Hassan, T. Hasan, A. A. Bulbul, M. K. Hasan, M. S. Hossain, D. S. Roy, and M. A. Awal. (2022, Jun.). Development of a smartphone-based expert system for COVID-19 risk prediction at early stage. Bioengineering. [Online]. 9(7), Available: https://doi.org/10.3390/bioengineering9070281

[12] Y. Didi, A. Walha, M. Ben Halima, and A. Wali. (2022). COVID‐19 outbreak forecasting based on vaccine rates and tweets classification. Computational Intelligence and Neuroscience. [Online]. 2022(1), p. 4535541. Available:

https://doi.org/10.1155/2022/4535541

[13] O. Abiola, A. Abayomi-Alli, O. A. Tale, S. Misra, and O. Abayomi-Alli. (2023, Jan.). Sentiment analysis of COVID-19 tweets from selected hashtags in Nigeria using VADER and Text Blob analyser. Journal of Electrical Systems and Information Technology. [Online]. 10(1), p. 5. Available: https://doi.org/10.1186/s43067-023-00070-9

[14] A. B. Aslam, Z. S. Syed, M. F. Khan, A. Baloch, and M. S. Syed. (2023, Jun.). Leveraging natural language processing for public health screening on YouTube: A COVID-19 case study. arXiv preprint. [Online]. Available: https://doi.org/10.48550/arXiv.2306.01164

[15] M. Akinloye. (2023, Dec.). Symptom-based Machine Learning Models for the Early Detection of COVID-19: A Narrative Review. arXiv preprint. [Online]. Available: https://doi.org/10.48550/arXiv.2312.06832

Name *

Email Address *

Affiliation *

Comments *

Security Code *

CAPTCHA Image

Computer and Knowledge Engineering

Volume 8, Issue 1 - Serial Number 15
April 2025
Pages 53-64

Files

Share

How to cite

Statistics

Article View: 280
PDF Download: 288