This dataset contains the results of an empirical evaluation of 1,200 AI-generated responses to 400 prompt questions related to public service contexts. The responses were assessed by 33 professionals across five institutional categories (adult education, regional policy, local welfare, union representatives, and program management). Each response was rated for understandability and accuracy across three temperature settings. The dataset includes both question-level metadata and user-assigned evaluation scores, and it supports the analysis presented in the paper "Intelligence at Different Temperatures: Experimenting with AI Response Quality in Public Services."