This dataset contains the results of a novel, LLM-driven annotation process applied to expert coffee reviews, as presented in the manuscript "Automated Multi-Label Coffee Flavor Classification: A Comparative Study of BERT and TF-IDF using LLM-Driven Data Annotation."
This "Minimal Dataset" is provided to ensure the reproducibility of our findings. It includes:
A unique review_id for each entry.
The original blind_assessment text used as input for the LLM.
The original quantitative sensory scores (Final Score, Aroma, Acidity/Structure, etc.) provided by human experts, which were used for the quantitative validation of the LLM's annotations.
The final 17 columns of binary (0/1) flavor labels as generated by the LLM.