A diagnostic system for detecting COVID-19 patients depending on lexicon semantic and Biterm Topic Model-Based Feature Selection on WhatsApp Messages Classification

Document Type : Original Article

Authors

1 Ministry of Education, General Directorate for Education in Al-Qadisiyah, Iraq.

2 University of Al-Qadisiyah, College of Agriculture

Abstract

COVID-19 has created an urgent need for innovative detection methods. This study presents a novel approach to identifying potential COVID-19 patients by analyzing their WhatsApp messages using advanced natural language processing techniques. Our methodology combines Word2Vec embeddings with lexical-semantic enrichment using ConceptNet, creating a comprehensive system that can detect subtle linguistic patterns associated with COVID-19 symptoms and experiences. The system processes WhatsApp messages through multiple stages: initial data collection, Word2Vec embedding, lexicon semantic enhancement, vector-space model creation, Biterm Topic Model-based feature selection, and finally, Naive Bayes classification. By enriching the language model with synonyms and capturing complex semantic relationships, our approach can identify potential COVID-19 cases based on how people describe their symptoms and experiences in everyday conversations. We tested the system on a sample of diverse WhatsApp messages, achieving promising results in distinguishing between messages from COVID-19 patients and healthy individuals. The system successfully identified both explicit statements of COVID-19 status and more subtle descriptions of symptoms, while correctly classifying non-COVID related messages with high confidence. While this method shows potential as a non-invasive and scalable screening tool, it should be viewed as complementary to existing diagnostic approaches rather than a replacement. Further large-scale testing is needed to fully validate the system's reliability and effectiveness in real-world applications.

Keywords

Main Subjects


 
 
CAPTCHA Image