When we speak, we engage nearly 100 muscles, continuously moving our lips, jaw, tongue, and throat to shape our breath into the fluent sequences of sounds that form our words and sentences. A new study reveals how these complex articulatory movements are coordinated in the brain.
The new research reveals that the brain's speech centers are organized more according to the physical needs of the vocal tract as it produces speech than by how the speech sounds (its "phonetics"). Linguists divide speech into abstract units of sound called "phonemes" and consider the /k/ sound in "keep" the same as the /k/ in "coop." But in reality, your mouth forms the sound differently in these two words to prepare for the different vowels that follow, and this physical distinction now appears to be more important to the brain regions responsible for producing speech than the theoretical sameness of the phoneme.
The findings published in Neuron, extend previous studies on how the brain interprets the sounds of spoken language, could help guide the creation of new generation of prosthetic devices for those who are unable to speak: brain implants could monitor neural activity related to speech production and rapidly and directly translate those signals into synthetic spoken language.
In the new study, researchers asked five volunteers awaiting surgery, with ECoG (electrocorticography) electrodes placed over a region of ventral sensorimotor cortex that is a key center of speech production, to read aloud a collection of 460 natural sentences. Individual electrodes encoded a diversity of articulatory kinematic trajectories (AKTs), each revealing coordinated articulator movements toward specific vocal tract shapes. The sentences were expressly constructed to encapsulate nearly all the possible articulatory contexts in American English. This comprehensiveness was crucial to capture the complete range of "coarticulation," the blending of phonemes that is essential to natural speech.
"Without coarticulation, our speech would be blocky and segmented to the point where we couldn't really understand it," said the author.
The research team was not able to simultaneously record the volunteers' neural activity and their tongue, mouth and larynx movements. Instead, they recorded only audio of the volunteers speaking and developed a novel deep learning algorithm to estimate which movements were made during specific speaking tasks.
This approach allowed the researchers to identify distinct populations of neurons responsible for the specific vocal tract movement patterns needed to produce fluent speech sounds, a level of complexity that had not been seen in previous experiments that used simpler syllable-by-syllable speech tasks.
The experiments revealed that a remarkable diversity of different movements were encoded by neurons surrounding individual electrodes. The researchers found there were four emergent groups of neurons that appeared to be responsible for coordinating movements of muscles of the lips, tongue, and throat into the four main configurations of the vocal tract used in American English. The researchers also identified neural populations associated with specific classes of phonetic phenomena, including separate clusters for consonants and vowels of different types, but their analysis suggested that these phonetic groupings were more of a byproduct of more natural groupings based on different types of muscle movement.
Regarding coarticulation, the researchers discovered that our brains' speech centers coordinate different muscle movement patterns based on the context of what's being said, and the order in which different sounds occur. For example, the jaw opens more to say the word "tap" than to say the word "has" -- despite having the same vowel sound (/ae/), the mouth has to get ready to close to make the /z/ sound in "has." The researchers found that neurons in the ventral sensorimotor cortex were highly attuned to this and other co-articulatory features of English, suggesting that the brain cells are tuned to produce fluid, context-dependent speech as opposed to reading out discrete speech segments in serial order.
"During speech production, there is clearly another layer of neural processing that happens, which enables the speaker to merge phonemes together into something the listener can understand," said another author.
"This study highlights why we need to take into account vocal tract movements and not just linguistic features like phonemes when studying speech production," author said.
Brain activity patterns underlying fluent speech
- 1,808 views