Probing the Link between Vision and Language in Material Perception
Materials are the building blocks of our surroundings. Material perception enables us to create a vivid mental representation of our environment, allowing us to appreciate the qualities and aesthetics of things around us, and decide how to interact with them. We can visually discriminate and recognize materials and infer their properties, and previous studies have identified diagnostic image features related to visual perception. Meanwhile, language reveals our subjective understanding of visual input and allows us to communicate relevant information about the material. To what extent words encapsulate the visual material perception remains elusive. Here, we used deep generative networks to create an expandable image space that allows us to systematically create and sample stimuli of familiar and unfamiliar materials. We compared the representations of materials from two cognitive tasks: visual material similarity judgments and verbal descriptions. We observed a moderate correlation between vision and language within individuals, but language alone can not fully capture the nuances of material appearance. We further examined the latent code of the generative model and found that image-based representation only exhibited a weak correlation with human vision. Joining image- and semantic-level representations substantially improved the prediction of human perception. Our results imply that material perception involves a semantic understanding of scenes to resolve the ambiguity of visual information and beyond merely relying on image features. This work illustrates the need to consider the vision-language relationship in building a comprehensive model for material perception and other tasks in high-level vision.
The raw data from the behavioral tasks can be found here:
- Multiple_arrangement folder: the RDMs from the Multiple Arrangement task.
- Verbal_description folder: the verbal description provided by the participants.