Text prompts and videos generated using 5 popular Text-to-Video models plus quality metrics including user quality assessments
A collection of 201 prompts which are used to generate short-form videos using 5 popular text-to-video models namely Tune-a-Video, VideoFusion, Text-To-Vudeo Synthesis, Text2Video-Zero and Aphantasia. Each of the 1,005 generated videos is included along with automatically calculated quality metrics naturalness, text similarity between the original prompt and a generated text caption, and inception score, for each. Each video was rated by 24 different people and the data also includes the MOS scores for alignment between the generated videos and the original prompts, as well as for perception and overall quality of the video.
Please cite this paper if using this dataset. GitHub URL for code for implementing video naturalness calculation is available at https://github.com/Chiviya01/Evaluating-Text-to-Video-Models