Supplementary file 2_Deductively coding psychosocial autopsy interview data using a few-shot learning large language model.xlsx
Psychosocial autopsy is a retrospective study of suicide, aimed to identify emerging themes and psychosocial risk factors. It typically relies heavily on qualitative data from interviews or medical documentation. However, qualitative research has often been scrutinized for being prone to bias and is notoriously time- and cost-intensive. Therefore, the current study aimed to investigate if a Large Language Model (LLM) can be feasibly integrated with qualitative research procedures, by evaluating the performance of the model in deductively coding and coherently summarizing interview data obtained in a psychosocial autopsy.
MethodsData from 38 semi-structured interviews conducted with individuals bereaved by the suicide of a loved one was deductively coded by qualitative researchers and a server-installed LLAMA3 large language model. The model performance was evaluated in three tasks: (1) binary classification of coded segments, (2) independent classification using a sliding window approach, and (3) summarization of coded data. Intercoder agreement scores were calculated using Cohen’s Kappa, and the LLM’s summaries were qualitatively assessed using the Constant Comparative Method.
ResultsThe results showed that the LLM achieved substantial agreement with the researchers for the binary classification (accuracy: 0.84) and the sliding window task (accuracy: 0.67). The performance had large variability across codes. LLM summaries were typically rich enough for subsequent analysis by the researcher, with around 80% of the summaries being rated independently by two researchers as ‘adequate’ or ‘good.’ Emerging themes in the qualitative assessment of the summaries included unsolicited elaboration and hallucination.
ConclusionState-of-the-art LLMs show great potential to support researchers in deductively coding complex interview data, which would alleviate the investment of time and resources. Integrating models with qualitative research procedures can facilitate near real-time monitoring. Based on the findings, we recommend a collaborative model, whereby the LLM’s deductive coding is complemented by review, inductive coding and further interpretation by a researcher. Future research may aim to replicate the findings in different contexts and evaluate models with a larger context size.