Ground truth annotations, which include entities, spatial relationships, temporal interactions (states, events), and contextual elements (weather, traffic) to validate AI-driven scene description generation