The large language models that allow AI systems like ChatGPT to predict the next word based on what came before tend to overlook a crucial aspect of human communication: the fact that meaning is not conveyed by words alone. In a new study published in Proceedings of the National Academy of Sciences (PNAS), researchers from Prof. Elisha Moses’s lab have demonstrated how the melody of speech functions as a distinct language that follows its own rules. Referred to by the linguistic term “prosody,” this language encompasses variations in pitch, loudness, tempo, and voice quality, and adds a nuanced layer of meaning beyond words.
Thanks to the Moses team’s innovative approach, we now have a better understanding of the statistical rules and patterns of prosody, which predates words in human evolution, and how it contributes to human communication. The study, led by linguist Dr. Nadav Matalon and neuroscientist Dr. Eyal Weinreb from the Moses lab in the Department of Physics of Complex Systems, investigated prosody as an unfamiliar language, using an AI model to analyze massive collections of audio recordings of spontaneous conversations in English. The model found hundreds of recurring elementary prosodic patterns, forming a basic prosody vocabulary. It also revealed how these patterns can perform different linguistic functions, depending on the context.


“Imagine if Siri could understand from the melody of your voice how you feel about
a certain subject and adapt her response accordingly,” says Dr. Eyal Weinreb.
This research lays the foundation for compiling a “dictionary” of prosody, which would catalog all the prosodic patterns we employ and their function or meaning in each case. Another future application could be the development of an AI tool capable of understanding and
conveying messages based on the melody of speech rather than words alone. “Imagine if Siri could understand from the melody of your voice how you feel about a certain subject and adapt her response accordingly,” Dr. Weinreb says. “We already have brain implants that convert neural activity into speech for people who can’t speak. If we can teach prosody to a computer model, we’ll be adding a significant layer of human expression that robotic systems currently lack.”
ELISHA MOSES IS SUPPORTED BY:
* The Maurice and Ilse Katz Professorial Chair