figshare
Browse
Styles15_AbstractAPCV_Instruments_.pdf (1.43 MB)

Giving shape to the voices of instruments: Audio-visual correspondences between spatial and temporal frequencies (Conference paper)

Download (1.43 MB)
journal contribution
posted on 2016-06-30, 06:26 authored by Suzy J StylesSuzy J Styles

Styles SJ (2015) Giving shape to the voices of instruments: Audio-visual correspondences between spatial and temporal frequencies, Paper presented at the  Asia Pacific Conference on Vision, July 2015, Singapore.

Document includes conference abstract + key method and data slides from the Oral presentation.

Abstract. What are the sensory drivers of audio-visual sound symbolism? Morton (1994) has suggested that in animals, lower fundamental frequencies are linked to larger shapes, as lower pitch naturally signals larger body plans. Ohala (1994) has suggested that the resonant frequencies of speech also code information about size, due to relative shortening/ lengthening of the vocal tract when the lips are retracted (e.g., 'ee') or lengthened (e.g., 'oo'), as reflected in the frequency of the 3rd harmonic (3rd formant). However, few experimental studies have attempted to test the combination of these factors empirically. We asked participants to select a shape from a grid of 11 different shapes, at 11 different sizes, while listening to a range of sounds. The shapes ranged from a highly convoluted shape (high edge complexity), to a simple circle (low edge complexity). Sounds were a high and low note from the natural range of 6 instruments from 3 different families: strings (violin, cello), doublereed (oboe, bassoon) and brass (trumpet, tuba), along with artificial since waves, and human voices.

For the instruments, lower notes were matched to rounder shapes than higher notes (F(1,27)=20.53, p<.001), but the precise shape selection differed between instrument families (F(1, 27)=17.69, p<.001), between instruments within a family (F(1, 27)=63.93, p<.001), and all three factors interacted (F(1, 81)=15.36, p<.001), producing unique patterns for individual sounds. A similar pattern was observed for male (low) and female (high) voices articulating the /i/ sound in 'feet', the /u/ sound in 'shoes', the /a/ sound in 'umbrella', and /y/, “ü”). Interestingly, shape (edge complexity) was modulated more than size, undermining the suggestion that audio-visual matching is primarily driven by the mapping of pitch to size. Instead, these findings suggest a mapping of spatial-to-temporal frequency, with higher pitches and higher harmonics corresponding to higher spatial frequencies.

Funding

Nanyang Assistant Professorship Grant to SJS

History