Poster D117

The Importance of Noise in Audiovisual Learning: An Artificial Neural Network Simulation of the McGurk Effect

Poster Session D - Monday, April 15, 2024, 8:00 – 10:00 am EDT, Sheraton Hall ABC

Lukas Grasse1 (, Matthew Tata; 1University of Lethbridge

Artificial Neural Networks (ANNs) are now approaching human-like performance on many tasks, and this allows novel methods of probing human learning and perception (Kanwisher et al. 2023). Training ANNs to replicate human perception enables researchers to investigate why our perceptual mechanisms might behave in particular ways, and also to shed light on the sometimes mysterious workings of these networks. This study explores the McGurk effect: an auditory-visual illusion wherein incongruent inputs lead to a fused, but incorrect, auditory percept. We recorded an audiovisual dataset of nine different word pairs previously demonstrated to elicit this effect and tested it on both humans and several recent state-of-the-art ANNs that were trained on audiovisual speech. Human participants selected the perceived word from a dropdown menu, while a K-nearest neighbours classifier was used on ANN output embeddings to decode forced-choice word classifications. We show that some ANNs do indeed exhibit the McGurk effect under certain circumstances. We further considered whether the McGurk effect in ANNs depends on training data, network architecture, or both. We discovered that training on audiovisual speech with noisy audio is crucial for replicating the illusion in ANNs, regardless of their network architecture or training objectives. Additionally, the network that most closely achieved a human-like McGurk effect was trained using a biologically plausible self-supervised task. These findings suggest that visual cues incorporated during speech learning in noisy environments are key to the audiovisual fusion observed in the McGurk illusion.

Topic Area: PERCEPTION & ACTION: Multisensory


April 13–16  |  2024