Neurons identify pitch changes in spoken language

Researchers have identified neurons in the human brain that respond to pitch changes in spoken language, which are essential to clearly conveying both meaning and emotion. The study was published in Science.

"One of the lab's missions is to understand how the brain converts sounds into meaning," first author said. "What we're seeing here is that there are neurons in the brain's neocortex that are processing not just what words are being said, but how those words are said."

Changes in vocal pitch during speech - part of what linguists call speech prosody - are a fundamental part of human communication, nearly as fundamental as melody to music. In tonal languages such as Mandarin Chinese, pitch changes can completely alter the meaning of a word, but even in a non-tonal language like English, differences in pitch can significantly change the meaning of a spoken sentence.

For instance, "Sarah plays soccer," in which "Sarah" is spoken with a descending pitch, can be used by a speaker to communicate that Sarah, rather than some other person, plays soccer; in contrast, "Sarah plays soccer" indicates that Sarah plays soccer, rather than some other game. And adding a rising tone at the end of a sentence ("Sarah plays soccer?") indicates that the sentence is a question.

The brain's ability to interpret these changes in tone on the fly is particularly remarkable, given that each speaker also has their own typical vocal pitch and style (that is, some people have low voices, others have high voices, and others seem to end even statements as if they were questions). Moreover, the brain must track and interpret these pitch changes while simultaneously parsing which consonants and vowels are being uttered, what words they form, and how those words are being combined into phrases and sentences -- with all of this happening on a millisecond scale.

Previous studies in both humans and non-human primates have identified areas of the brain's frontal and temporal cortices that are sensitive to vocal pitch and intonation, but none have answered the question of how neurons in these regions detect and represent changes in pitch to inform the brain's interpretation of a speaker's meaning.

In the new study, researchers asked 10 volunteers awaiting surgery with the electrodes in place to listen to recordings of four sentences as spoken by three different synthesized voices:

"Humans value genuine behavior"
"Movies demand minimal energy"
"Reindeer are a visual animal"
"Lawyers give a relevant opinion"

The sentences were designed to have the same length and construction, and could be played with four different intonations: neutral, emphasizing the first word, emphasizing the third word, or as a question. You can see how these intonation changes alter the meaning of the sentence: "Humans [unlike Klingons] value genuine behavior;" "Humans value genuine [not insincere] behavior;" and "Humans value genuine behavior?" [Do they really?]

Researchers monitored the electrical activity of neurons in a part of the volunteers' auditory cortices called the superior temporal gyrus (STG), which previous research had shown might play some role in processing speech prosody.

They found that some neurons in the STG could distinguish between the three synthesized speakers, primarily based on differences in their average vocal pitch range. Other neurons could distinguish between the four sentences, no matter which speaker was saying them, based on the different kinds of sounds (or phonemes) that made up the sentences ("reindeer" sounds different from "lawyers" no matter who's talking). And yet another group of neurons could distinguish between the four different intonation patterns. These neurons changed their activity depending on where the emphasis fell in the sentence, but didn't care which sentence it was or who was saying it.

To prove to themselves that they had cracked the brain's system for pulling intonation information from sentences, the team designed an algorithm to predict how neurons' response to any sentence should change based on speaker, phonetics, and intonation and then used this model to predict how the volunteers' neurons would respond to hundreds of recorded sentences by different speakers. They showed that while the neurons responsive to the different speakers were focused on absolute pitch of the speaker's voice, the ones responsive to intonation were more focused on relative pitch: how the pitch of the speaker's voice changed from moment to moment during the recording.

"To me this was one of the most exciting aspects of our study," first author said. "We were able to show not just where prosody is encoded in the brain, but also how, by explaining the activity in terms of specific changes in vocal pitch."

These findings reveal how the brain begins to take apart the complex stream of sounds that make up speech and identify important cues about the meaning of what we're hearing. Who is talking, what are they saying, and just as importantly, how are they saying it?

"Now, a major unanswered question is how the brain controls our vocal tracts to make these intonational speech sounds," said the paper's senior author. "We hope we can solve this mystery soon."

https://www.ucsf.edu/news/2017/08/408136/how-human-brain-detects-music-speech