<p dir="ltr">The emergence of synthetic images generated through artificial intelligence represents a paradigm shift in visual content production and interpretation. Much like early photography—once uncritically viewed as a ‘mirror’ of reality—generative AI produces potentially high-fidelity depictions of places, people, and scenes. At the same time, AI enables instantaneous creation of entirely fabricated scenarios, challenging interpretations of reality through the reports given by images and rendering the accuracy of visual messages infinitely elastic. Moreover, the multimodal nature of synthetic image production blurs conventional boundaries between textual and visual literacies, highlighting the need for a more holistic understanding of visual literacy as part of a broader set of multimodal literacies. Addressing these changes requires not only competencies to distinguish AI-generated images from photographic imagery but also rethinking visual literacy practices to consider the socio-technical dynamics of generative AI, where images are produced through layered mediations of data patterns, interface architectures, and user inputs. This paper’s goal, then, is to identify the repertoire of literacies that emerge from multimodal co-production with visual generative AI. It is an agenda-setting article that uses journalism as a worked context and situates this exploration within one particular moment of human-AI interaction—the site of verbal-to-visual production.</p>