Evaluating the Accuracy and Readability of ChatGPT-4o’s Responses to Patient-Based Questions about Keratoconus
This study aimed to evaluate the accuracy and readability of responses generated by ChatGPT-4o, an advanced large language model, to frequently asked patient-centered questions about keratoconus.
A cross-sectional, observational study was conducted using ChatGPT-4o to answer 30 potential questions that could be asked by patients with keratoconus. The accuracy of the responses was evaluated by two board-certified ophthalmologists and scored on a scale of 1 to 5. Readability was assessed using the Simple Measure of Gobbledygook (SMOG), Flesch-Kincaid Grade Level (FKGL), and Flesch Reading Ease (FRE) scores. Descriptive, treatment-related, and follow-up-related questions were analyzed, and statistical comparisons between these categories were performed.
The mean accuracy score for the responses was 4.48 ± 0.57 on a 5-point Likert scale. The interrater reliability, with an intraclass correlation coefficient of 0.769, indicated a strong level of agreement. Readability scores revealed a SMOG score of 15.49 ± 1.74, an FKGL score of 14.95 ± 1.95, and an FRE score of 27.41 ± 9.71, indicating that a high level of education is required to comprehend the responses. There was no significant difference in accuracy among the different question categories (p = 0.161), but readability varied significantly, with treatment-related questions being the easiest to understand.
ChatGPT-4o provides highly accurate responses to patient-centered questions about keratoconus, though the complexity of its language may limit accessibility for the general population. Further development is needed to enhance the readability of AI-generated medical content.