RECOGNIZING HEALTH CONCEPTS IN TWITTER DATA USING LARGE LANGUAGE MODEL’S

Chavan, Soniya Sagar

doi:10.25394/PGS.28966583.v1

RECOGNIZING HEALTH CONCEPTS IN TWITTER DATA USING LARGE LANGUAGE MODEL’S

thesis

posted on 2025-05-08, 20:44 authored by Soniya Sagar ChavanSoniya Sagar Chavan

This thesis presents a structured framework leveraging large language models (LLMs)—GPT-4-0613 via LangChain, GPT-4 Turbo, and Gemini 2.0 Flash—for extracting, normalizing, and categorizing COVID-19 symptoms from informal Twitter posts. Using a pre-annotated dataset of 635 tweets as ground truth, the study evaluates each model’s ability to identify symptoms and temporal references expressed through varied, often non-clinical language.

To address LLM non-determinism, the framework introduces a consensus mechanism across three inference runs per model. Outputs are semantically matched, normalized, and categorized using prompt-driven Gemini 2.0 Flash models to ensure consistency across all stages. The evaluation metrics include accuracy, precision, recall, and F1-score, with GPT-4-0613 demonstrating the highest overall performance.

The study further visualizes results through a 3D symptom-day-category data cube to support trend analysis. Findings highlight the potential of LLMs, when combined with prompt engineering and ensemble strategies, to enhance public health surveillance from social media data streams. This reproducible pipeline offers a scalable solution for timely health monitoring and can generalize to other diseases and platforms.

History

Degree Type

Master of Science

Department

Computer and Information Technology

Campus location

Hammond

Advisor/Supervisor/Committee Chair

Keyuan Jiang

Additional Committee Member 2

Ashok Vardhan Raja

Additional Committee Member 3

George Stefanek

RECOGNIZING HEALTH CONCEPTS IN TWITTER DATA USING LARGE LANGUAGE MODEL’S

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Usage metrics

Categories

Keywords

Licence

Exports