This hands-on tutorial walks participants through building an automated evaluation pipeline for RAG applications. Using real examples, we’ll define key evaluation criteria and implement simple methods to assess LLM output quality—focusing on completeness, relevance, and hallucinations. Presented at DataNights Course.