Annotation of visual and auditory features of a film.

A: The presence of speech, lead singing, background singing, and music were annotated manually from the soundtrack. The zero crossing rate, spectral spread, entropy, and RMS energy sound features were extracted automatically. B: Spatial high-pass filtering was used to extract high spatial frequencies from the image to quantify to overall complexity of the image. For printing the contrast of the high-pass filtered image was increased to make the features visible. (Still images courtesy of Aki Kaurismäki and Sputnik Oy.) C: Scoring of size of body parts/objects followed the shot size convention used in cinema (long shots  =  0, medium/medium close-up shots  =  1, and close-up shots  =  2). D: Extent of motion was scored on three-step scale (no motion  =  0, intermediate motion  =  1, large motion  =  2). The overall motion score was calculated as the sum of the scores of shot size and motion strength for those time points where motion was present.



CC BY 4.0