Skip to main content Skip to navigation

Interactive alignment in the evolution of vocal communication

Workshop at the International Conference on the Evolution of Language (EVOLANG XVLink opens in a new window) 2024, Madison, WI, USA

May 18th, 2024, 9:15-12:00 (Central Daylight Time)

Organizers: Julia Hyland Bruno, New Jersey Institute of Technology, USA ( and Olga Fehér, University of Warwick, UK (
Chiara De Gregorio, Department of Psychology, University of Warwick
Julie Elie, Department of BioEngineering, UC Berkeley
Chiara Gambi, Department of Psychology, University of Warwick
Marisa Hoeschele, Acoustic Research Institute, Austrian Academy of Sciences
Eduardo Mercado III, Department of Psychology, University at Buffalo, SUNY
Thomas Wolf, Department of Cognitive Science, Social Mind Centre, CEU

In a highly influential paper, Levinson proposed that humans have a special cognitive ability that allows for highly complex social interaction which drives linguistic, social and cultural evolution in our species (Levinson, 2006). This “interaction engine” relies on abilities such as ascribing intentions to others, mental coordination, and simulating interlocutors’ states of mind. According to this theory, language evolved in a rich interactive context for the purpose of communication. In recent years, the evolution of the cognitive capacities underlying the interaction engine have been studied in both humans and non-human primates (Heesen & Frohlich, 2022). Studies have largely focused on interactional processes like turn-taking, joint action, repair and sequence organization (e.g., Holler et al., 2016). We believe vocal alignment is one of the essential mechanisms that facilitate communicative interaction but one which has not been sufficiently examined through a comparative, interdisciplinary lens.

Linguistic alignment has been shown to be a major mechanism by which people share linguistic and mental representations (Pickering & Garrod, 2004). Alignment makes communication more efficient, because it allows for information alignment rather than simply information transfer in communication (Pickering & Garrod, 2006). It seems to be a more or less automatic process, an example of inter-individual imitation that happens in interaction (Chartrand & Bargh, 1999), although it is mediated by social expectations (Weatherholtz et al., 2014). Another influential theory (communication accommodation theory; Giles & Ogay, 2007) proposes that people use linguistic convergence and divergence to decrease and increase social distance in interaction. Whether social intention drives alignment or not, alignment has a big role in facilitating linguistic and social interaction. Therefore, it is likely that vocal alignment impacted language evolution in important ways, by facilitating cooperation between interlocutors and by directly influencing language structure (Fehér et al., 2019).

While language is unique to humans, vocal learning (the cognitive ability that underpins language) is shared with a small number of non-human animal groups. Most notably, songbirds have become the most widely accepted animal model for language (e.g., Searcy & Nowicki, 2019). In songbirds, both moment-to-moment vocal alignment and repertoire convergence in closely affiliated individuals have been observed (Beecher, 2000; Hyland Bruno, unpublished). In this workshop, we would like to extend the Evolang discussion on interactive processes to non-human vocal learning animals such as songbirds and other vocally communicating animals, because we believe that considering the convergent mechanisms that underlie vocal interaction in other organisms (e.g., Benichov et al., 2016; Hoffmen et al., 2019; Coleman et al., 2021; Bacciadonna et al., 2022; De Gregorio et al., 2022; Costalunga et al., 2023) can yield valuable insights into the evolution of uniquely complex communication in humans.

9:15-9:25 Introduction by Julia and Olga

15 min talks + 5 min Q&A

9:25-9:45 Thomas Wolf - How do work songs stabilize the tempo of rhythmic joint actions?
Recent research has shown that people engaged in rhythmic interactions have a strong tendency to increase their tempo; the phenomenon of joint rushing. However, successful joint action often requires interpersonal coordination at a stable tempo. One way to achieve a stable tempo despite the joint rushing tendency is the use of vocalizations as observed in actual work songs accompanying straining joint actions. Two characteristics of actual work songs may be particularly effective: the use of solo passages and metric subdivisions of the intervals between instrumental actions. The aim of the present study was to test whether these two characteristics help pairs of participants to overcome joint rushing and to maintain a target tempo. Participants performed a joint synchronization-continuation finger tapping task, in which they first synchronized with a fading metronome and then continued tapping, while vocalizing numbers. In three experiments we tested the effect of solo counting and metric subdivisions separately, as well as in combination. Only solo counting and subdivisions combined helped participants to maintain the target tempo. The results indicate that vocalizations can indeed stabilize the tempo of rhythmic joint actions. There may also be other aspects of work songs that can support achieving joint goals.
9:45-10:05 Chiara Gambi - The puzzle of turn-taking in developmental psycholinguistics

Smooth turn-taking appears to be a universal of adult human conversations (Stivers et al., 2009). Levinson (2006; 2016) famously argued that turn-taking precedes the development of complex language production abilities in ontogeny. Indeed, many of the building blocks that are thought to be necessary for smooth turn-taking in conversations between adults appear to be in place very early in development, including: sensitivity to contingent responding, entrainment, and prediction. Yet children’s turn-taking is much slower than that of adults. I will argue that– consistent with Levinson’s claim – the slow development of language production skills is likely to be an important bottleneck. But in addition, the ability to flexibly switch between understanding others and crafting one’s own message is also key to adult-like conversational skills and likely slow to emerge over developmental time.

10:05-10:25 Marisa Hoeschele - Let’s include unstereotyped vocalizations in our search for vocal alignment in other species

When researchers study animal vocalizations, typically silence is used as the marker between vocal units. If we did this for humans, however, we would treat entire sentences as single units of sound. Because we humans often produce sentences that we never utter again, this means that, at the level of the breath unit, our vocalizations are unstereotyped. A subset of vocal learning species also produce similar unstereotyped vocalizations: vocal units separated by silence that appear to have a unique structure each time they are uttered. These unstereotyped vocalizations are typically lumped together in broad categories without deeper study, however, they may represent the closest analogue to human speech in terms of structure. Because of physical limitations, all unstereotyped vocalizations must be built of fundamental subunits, which we would refer to as "phones" in humans. Humans combine the subunits from this unstereotyped vocal structure with vocal alignment – e.g., sequence organization and turn taking between individuals – to allow language to emerge. It has been argued that the flexibility that this combinatorial system offers is critical in allowing humans to refer to new objects and situations. We have been studying the unstereotyped "warble" vocalization of budgerigars – a small parrot species. We discovered that these unstereotyped vocal units are made up of subunits that appear similar to consonants and vowels in human speech. While it is unknown whether the unstereotyped vocalizations of other species are referential or simply a way of showing vocal prowess, studying budgerigars and other species with unstereotyped vocal units in terms of vocal alignment might be especially fruitful when searching for language-like communication systems in other species.

10:25-10:35 Break
10:35-10:55 Julie Elie - Vocal production in the Egyptian fruit-bat

Some species have evolved the ability to use the sense of hearing to modify existing vocalizations, or even create new ones. This ability corresponds to various forms of vocal production learning that are all possessed by humans, and independently displayed by distantly related vertebrates. Among mammals, only a few species would possess such vocal learning abilities. Yet the anatomical and neurophysiological specificities that determine the ability of a mammal for vocal learning remain largely elusive. With a multidisciplinary approach spanning vocal behavior, anatomy and neurophysiology, we explored the vocal learning trait in Egyptian fruit-bats (Rousettus aegyptiacus). First, we tested the necessity of an intact auditory system for the development of this bat typical vocal repertoire. Eliminating pups’ sense of hearing at birth and assessing its effects on vocal production in adulthood, enabled us to both causally test the vocal learning ability of Egyptian fruit-bats, and discern learned from innate aspects of their vocalizations. Second, we tested and found evidence for the long standing hypothesis of a neuro-anatomical specialization in vocal learners: a direct projection from the motor cortex to the motoneurons that directly control the muscles of the phonation organ. Finally, we investigated what could be the role of cortical neurons in this region that directly projects to laryngeal motoneurons. We found that vocal production is sexually dimorphic in the Egyptian fruit-bat and that only some vocalizations need auditory feedback for production.

10:55-11:15 Eduardo Mercado III - Vocal interactions between singing humpback whales

Humpback whales produce predictably patterned sound sequences within multi-hour sessions, often when they are alone and motionless. These sequences are some of the most acoustically complex vocal patterns produced by any mammal. They are also some the most dynamic in that individual whales constantly modify both the sounds and patterns they produce throughout their adult lives. Recent acoustic analyses suggest that humpbacks are capable of flexibly modulating the properties of vocalizations within these sequences along multiple dimensions and that they do so in real-time. This finding suggests that whales vocalizing within earshot of one another should be able to interactively adjust their vocalizations in reaction to what they are hearing. Preliminary analyses of co-vocalizing humpback whales are revealing a variety of ways in which humpbacks may coordinate the production of overlapping sound sequences by dynamically adjusting features of the sounds and sound patterns they produce.

11:15-11:35 Chiara De Gregorio - Rhythmic alignment in the vocal interactions of duetting primates

Vocal alignment is a phenomenon whereby two emitters tend to adapt or align their vocalizations to one another. When the adaptation concerns the temporal structure, we can refer to it as rhythmic alignment. This vocal interaction is a crucial feature of human conversation, but it is unclear which selective pressures drove the emergence of such a fine-tuned process, and its presence in the animal kingdom remains understudied. Among non-human primates, duetting ones are a promising model to investigate how rhythmic alignment evolved in our ancestors and phylogeny. These primates, belonging to a restricted circle, communicate using complex vocal interactions where vocal units are not emitted randomly but follow a specific temporal pattern. We investigated rhythmic alignment in four species of duetting primates (Indri indri, Plecturocebus cupreus, Nomascus siki, Hylobates pileatus), whose duets are characterized by different degrees of vocal overlap between pair-mates. We used as a proxy of rhythmic alignment the duration of the inter-onset intervals between vocal units and the presence of different rhythmic categories. Indris and titi monkeys mostly sing in duets and choruses, while the emission of solo songs is very rare. We found that males and females tend to have different singing tempi but converge towards an isochronous rhythm. Moreover, in titi monkeys, we found that members of the same reproductive couple do not differ in terms of rhythmic regularity, and the pair that showed less regularity consisted of a relatively recent couple. On the other hand, gibbon species sing both duet and solo songs. N. siki duets with no overlap between males’ and female’s phrases, while H. pileatus partially overlaps. We found that in N. siki, individual identity, and social factors influenced rhythmic alignment in gibbons, with different rhythmic categories emerging based on the presence/absence of co-singers. Differently, in H. pileatus there were no differences in rhythmic structure between solo songs and duet songs. Our work suggests that, similarly to humans, the strength of alignment to duetting partners might vary with social factors and singing style and represent a first comparative investigation on the selective pressure that shaped rhythmic alignment in the primate clade.

11:35-12:00 General Discussion