Enhancing Vaccine Adverse Event Detection on Social Media through LLM-Driven Synthetic Data Augmentation

Abdusalam Nwesri; Mai Elbaabaa; Nabila Shinbir; Hasan Ebrahem; Marwa Solla

doi:10.65568/gujes.2026.020112

المؤلفون

Abdusalam Nwesri University of Tripoli المؤلف
مي البعباع حامعة طرابلس المؤلف
نبيلة شنبر كلية العلوم والتقنية المؤلف
حسن ابراهيم جامعة طرابلس المؤلف
مروة صولة جامعة طرابلس المؤلف

DOI:

https://doi.org/10.65568/gujes.2026.020112

الكلمات المفتاحية:

النماذج اللغوية الضخمة، اكتشاف الاثار الجانبية للقاحات، زيادة بيانات الاختبار

الملخص

تقيّم هذه الورقة البحثية أثرَ زيادة البيانات الاصطناعية على أداء رصد ردود الفعل الشخصية تجاه اللقاحات في منشورات وسائل التواصل الاجتماعي. تستند دراستنا إلى مشاركة فريق جامعة طرابلس في المهمة السادسة من المهمة المشتركة العاشرة لمؤتمر التنقيب في وسائل التواصل الاجتماعي من أجل الصحة (#SMM4H). ومن خلال وضع خط أساس عبر ضبط ستة نماذج لغوية كبيرة (LLMs)، نحلل كيفية تأثير زيادة مجموعة التدريب بأمثلة مُولّدة اصطناعياً على مقاييس التصنيف. تُظهر تجربتنا أن زيادة البيانات الاصطناعية تُحسّن الأداء بشكل ملحوظ في جميع النماذج، مع فائدة إضافية للنماذج الصغيرة.

المراجع

[1] Guellil, I., Berrachedi, Y., Chenni, N. et al. Detecting Adverse Drug Events in Social Media: A Brief Literature Review. SN COMPUT. SCI. 7, 199 (2026). https://doi.org/10.1007/s42979-026-04752-9

[2] Amin Khademi and et al. Extracting adverse events from covid-19 vaccine con- versations on twitter. In Proceedings of the International Conference on Social Media Mining for Health, 2022.

[3] Sedigheh Khademi Habibabadi, Pari Delir Haghighi, Frada Burstein, and Jim Buttery. Vaccine adverse event mining of twitter conversations: 2-phase clas- sification study. JMIR Med Inform, 10(6):e34305, Jun 2022.

[4] Abeed Sarker et al. (2016). Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter. Drug Safety. 39. 10.1007/s40264-015-0379-4.

[5] Bosung Kim and Ndapa Nakashole. 2022. Data Augmentation for Rare Symptoms in Vaccine Side-Effect Detection. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 310–315, Dublin, Ireland. Association for Computational Linguistics.

[6] Ahmet Okan Arık, Gizem Parlayandemir, Serra Çelik (2026), LLM-based data augmentation for text classification on imbalanced datasets: A case study on fake news detection, Egyptian Informatics Journal, Volume 33, 2026,100886, ISSN 1110-8665, https://doi.org/10.1016/j.eij.2026.100886.

[7] Ari Z. Klein, Tirthankar Dasgupta, Ivan Flores Amaro, Sudeshna Jana, Sedigh Khademi, Guillermo Lopez-Garcia, Takeshi Onishi, Jeanne Powell, Lisa Raithel, Swati Rajwal, Roland Roller, Abeed Sarker, Manjira Sinha, Philippe Thomas, Elena Tutubalina, Dongfang Xu, Pierre Zweigenbaum, and Graciela Gonzalez- Hernandez. Overview of the 10th Social Media Mining for Health (#SMM4H) and Health Real-World Data (HeaRD) Shared Tasks at ICWSM 2025. In Work- shop Proceedings of the 19th International AAAI Conference on Web and Social Media. AAAI Press, 2025.

[8] Bosung Kim and Ndapa Nakashole. 2022. Data Augmentation for Rare Symptoms in Vaccine Side-Effect Detection. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 310–315, Dublin, Ireland. Association for Computational Linguistics.

[9] Yuan Chen, Zhisheng Zhang, An easy numeric data augmentation method for early-stage COVID-19 tweets exploration of participatory dynamics of public attention and news coverage, Information Processing & Management, Volume 59, Issue 6, 2022, 103073, ISSN 0306-4573, https://doi.org/10.1016/j.ipm.2022.103073.

[10] Simone Scaboro, Beatrice Portelli, and Giuseppe Serra, Detection of Adverse Drug Events from Social Media Texts - Research Project Overview77-86, in proceedings of HC@AIxIA 2022: 1st AIxIA Workshop on Artificial Intelligence For Healthcare, November 30, 2022, Udine, IT

[11] Feng X, Luo J, Yang Y, El Baz D, Shi L. Health Misinformation Detection: Approaches, Challenges and Opportunities. Inquiry. 2025 Jan-Dec;62:469580251384784. doi: 10.1177/00469580251384784. Epub 2025 Nov 4. PMID: 41189452; PMCID: PMC12589804.

[12] Abdelsalam Nwesri, Mai Elbaabaa, Nabila Shinbir, Enhancing Vaccine Reaction Detection from Social Media Using Optimized Transformer Fine-Tuning, Libyan Journal of InformaticsVolume 03, No. 02, December. 2025.

[13] Francesco Barbieri, Jose Camacho-Collados, Luis Espinosa Anke, and Leonardo Neves. TweetEval: Unified benchmark and comparative evaluation for tweet classification. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1644–1650, Online, November 2020. Association for Computational Linguistics.

[14] Daniel Loureiro, Kiamehr Rezaee, Talayeh Riahi, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, and Jose Camacho-Collados. Tweet insights: A visualization platform to extract temporal insights from twitter. arXiv preprint arXiv:2308.02142, 2023.

[15] Sedigh Khademi, Christopher Palmer, Gerardo Luis Dimaguila, Muhammad Javed, and Jim Buttery. Exploring Large Language Models for Detecting Online Vaccine Reactions. In Proceedings of HIC 2024 - Health. Innovation. Commu- nity: It Starts With Us, volume 318, pages 30–35, 2024.

[16] Pengcheng He, Jianfeng Gao, and Weizhu Chen. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. CoRR, abs/2111.09543, 2021.

[17] Mae ̈l Jullien, Marco Valentino, Hannah Frost, Paul O’regan, Donal Landers, and Andre ́ Freitas. SemEval-2023 task 7: Multi-evidence natural language inference for clinical trial data. In Atul Kr. Ojha, A. Seza Dog ̆ruo ̈z, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, and Elisa Sartori, editors, Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 2216–2226, Toronto, Canada, July 2023. Association for Computational Linguistics.

[18] JacobDevlin,Ming-WeiChang,KentonLee,andKristinaToutanova.BERT:pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018.

تحسين اكتشاف الآثار الجانبية للقاحات على وسائل التواصل الاجتماعي من خلال استخدام زيادة بيانات التدريب باستخدام نماذج اللغة الضخمة

المؤلفون

DOI:

الكلمات المفتاحية:

الملخص

المراجع

التنزيلات

منشور

النسخ

إصدار

القسم

الرخصة

اللغة

المعلومات

قالب المجلة

إنشاء طلب نشر

الرقم الدولي الموحد

Plagiarism Checker

الفهرسة

دولج

DOI