Poster E97, Monday, March 26, 2:30-4:30 pm, Exhibit Hall C
Silent lip reading generates speech signals in auditory cortex
Karthikeyan Ganesan1, Jacob Zweig2, Marcia Grabowecky2, Satoru Suzuki2, Vernon Towle3, James Tao3, Shasha Wu3, David Brang1; 1University of Michigan, 2Northwestern University, 3University of Chicago
Observing a speaker’s mouth movements helps listeners perceive the sounds they are producing, particularly in noisy environments. It has been proposed that crossmodal activation in auditory cortex – engendered by visual access to mouth movements – might underlie this effect. However, the content of this crossmodal activation remains unknown. Here, we utilize deep learning algorithms to demonstrate that the observation of visual speech movements generates neural activity in auditory cortex similar to that generated while listening to phonemes. We recorded electrocorticographic (ECoG) activity from macroscopic depth electrodes implanted within auditory cortices of the brains of epilepsy patients. On each trial, patients were presented with single phonemes or videos showing the lip movements articulating each phoneme. We constructed an ensemble of deep convolutional neural networks to determine whether the identities of the four phonemes (from auditory-alone trials) and visemes (from visual-alone trials) could be decoded from auditory cortical activity. As expected, the ensemble accurately decoded phonemes from activity in auditory cortex, with decoding accuracy influenced by information from the theta band (4-7 hz) and beta band (~20 hz). Critically, the ensemble also accurately decoded visemes from activity in auditory cortex, revealing that lip reading generates viseme-specific activity in auditory cortex in the absence of any speech sound. Importantly, the algorithm trained with phonemes successfully decoded visemes, indicating the involvement of similar neural populations and coding in auditory cortex regardless of the input modality. These results demonstrate that observing visual speech movements crossmodally activates auditory speech processing in a content-specific manner.
Topic Area: PERCEPTION & ACTION: Multisensory