Dummy
2024, Voice User Interface, Unity, Video, Interview with Crazy Minnow Studio, 6:05

How do AI-driven lip synchronization systems translate and visually perform human speech, and what does this reveal about the ways software “speaks” or represents us in digital environments such as video games?

Ventriloquy is the art of making one’s voice appear to come from somewhere else. Dummy is a voice user interface that explores the mouth as a visual language—one we read, interpret, and have programmed to behave in specific ways. Using lip-synchronization software, the interface employs audio amplitude and phoneme detection to approximate mouth shapes known as visemes—the visual counterparts of phonemes, or units of sound in speech. By intentionally separating sound from shape, the simulation introduces misalignments that generate new facial expressions, meanings, and modes of communication.

The video uses audio from an interview with Crazy Minnow Studio, creators of SALSA Lip-Sync—an animation tool used to puppeteer character mouths in video games and 3D simulations. Through this process, the project explores how software ventriloquizes human voice, translating it into computational gestures that both mimic and distort human expression.

The work situates itself within contemporary discourse on AI-mediated communication and the automation of human expression. As lip-sync and generative AI systems increasingly perform speech in gaming, virtual production, and social media, questions of authorship, representation, and agency arise: Who is really speaking when a digital mouth moves? What assumptions about language, emotion, and identity are embedded in these algorithmic performances?

The project reveals that communication technologies, particularly AI-driven interfaces, function as interpretive systems, not transparent channels. By separating phonemes from visemes, Dummy exposes speech as a site of translation, where software interprets and performs voice through computational models of expression. These distortions make visible the cultural and aesthetic assumptions embedded in machine-mediated speech: notions of intelligibility, emotion, and even gendered expressiveness. Ultimately, when software “speaks” on our behalf, it does more than represent us, it redefines what counts as expression, agency, and presence in digital environments.

Thank you to Crazy Minnow Studio

Video excerpt

Unity prototype

Unity simulation - waiting mode when no audio is present

Still from Unity simulation

HOUSE — Jenny Rodenhouse

Designer — Educator

[id="Q1004802321"] bodycopy { } [id="Q1004802321"].page { justify-content: center; background-color: #ffffff; } .overlay-content:has([id="Q1004802321"]) { } [id="Q1004802321"] .page-content { border-radius: 0rem; padding-top: 1.7rem; padding-bottom: 1.7rem; background-color: #ffffff; } [id="Q1004802321"] .page-layout { align-items: flex-end; }