Author
Listed:
- Sonia Saini
- Ruchi Agarwal
- SP Singh
- Punit Gupta
- Ankit Vidhyarthi
- Rohit Verma
Abstract
Social Media has given an exponential rise to an ever-connected world. Health data that was earlier viewed as hospital records or clinical records is now being shared as text over social media. Information and updates regarding the outbreak of a pandemic, clinical visit results, general health updates, etc., are being analyzed. The data is now shared more frequently in various formats such as images, text, documents, and videos. With fast streaming systems and no constraints on storage spaces, all this shared rich media data is quite voluminous and informative. For shared health data such as discussions on ailments, hospital visits, general health well-being updates, and drug research updates via official Twitter handles of various pharmaceutical companies and healthcare organizations, a unique level of challenge is posed for analysis of this data. The text indicating the ailment often varies from proper medical jargon to common names for the same, whereas the intent is the same in predicting the disease or ailment term. This paper focuses on how we can extract and analyze health-related data exchanged on social media and introduce an Augmented Ensemble Model (AEM), which identifies the frequently shared topics and discussions about health on social networks, to predict the emerging health trends. The analytical model works with chronological datasets to deduce text classification of topics related to health. This Hybrid Model uses text data augmentation to address class imbalance for health terms and further employs a clustering technique for location-based aggregation. An algorithm for health terms Word Vector Embedding model is formulated. This Word Vector model is further used in Text Data Augmentation to reduce the class imbalance. We evaluate the accuracy of the classifiers by constructing a Machine Learning pipeline. For our Augmented Ensemble Model, the Text classification accuracy is evaluated after the augmentation using a voting ensemble technique, and a greater accuracy has been observed. Emerging health trends are analyzed via temporal classification and location-wise aggregation of the health terms. This model demonstrates that a Text Augmented Ensemble Machine Learning approach for health topics is more efficient than the conventional Machine Learning classification technique(s).
Suggested Citation
Sonia Saini & Ruchi Agarwal & SP Singh & Punit Gupta & Ankit Vidhyarthi & Rohit Verma, 2025.
"Augmented Ensemble Model (AEM) for health trends prediction on social networks,"
PLOS ONE, Public Library of Science, vol. 20(6), pages 1-18, June.
Handle:
RePEc:plo:pone00:0323449
DOI: 10.1371/journal.pone.0323449
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0323449. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.