It’s been studied for millennia. But what connects us to certain voices? Chris Catchpole explores

Noun; the sound produced in a person’s larynx and uttered through the mouth.

By the 45th second of Think, first heard on her 1968 Atlantic records album, Aretha Now, Aretha Franklin has scaled a full two octaves. Way before the chorus is even teetering on the horizon it’s Franklin’s voice, swinging down, reaching up and stomping forward that tells us all we need to know: that this song is not only irresistibly dance-worthy, it’s an act of defiance.

We know this because, as the chorus ascends, and the tune’s demands for freedom rise into a rallying cry, we feel it. So how does the human voice produce such a response? And how does that emotion transcend decades, geographical locations – and, often, stay with us long after the vocal has ended?

Our relationship with voice is deep-rooted. In our daily lives, our own voices are capable of communicating up to 24 emotions solely through non-vocal sounds we emit, often called ‘vocal-bursts’ – a gasp to denote surprise, interest (ah) or confusion (huh?), for example. How our voices sound, according to some studies, can make up 38% percent of our first impression on others – generally, we’ll use a higher pitch to speak to someone we determine to have a higher status, for example, regardless of how we perceive ourselves. In turn, we use our perception of other people’s voices to determine whether or not we trust a person. Generally, the human voice’s complex speech tone – or f0 – is around 100-120 Hz for men,  and one octave higher for women. Yet, the vocal ranges we are able to phonate span extremes of high and lows in terms of pitch. Our vocal compass, which consists of all sounds from the lowest grunt to the highest squeak, is even more vast.

Our relationship with voice is deep-rooted. In our daily lives, our own voices are capable of communicating up to 24 emotions solely through non-vocal sounds we emit, often called ‘vocal-bursts’ – a gasp to denote surprise, interest (ah) or confusion (huh?), for example. How our voices sound, according to some studies, can make up 38% percent of our first impression on others – generally, we’ll use a higher pitch to speak to someone we determine to have a higher status, for example, regardless of how we perceive ourselves. In turn, we use our perception of other people’s voices to determine whether or not we trust a person. Generally, the human voice’s complex speech tone – or f0 – is around 100-120 Hz for men, and one octave higher for women. Yet, the vocal ranges we are able to phonate span extremes of high and lows in terms of pitch. Our vocal compass, which consists of all sounds from the lowest grunt to the highest squeak, is even more vast.

Our relationship with voice is deep-rooted. In our daily lives, our own voices are capable of communicating up to 24 emotions solely through non-vocal sounds we emit, often called ‘vocal-bursts’ – a gasp to denote surprise, interest (ah) or confusion (huh?), for example. How our voices sound, according to some studies, can make up 38% percent of our first impression on others – generally, we’ll use a higher pitch to speak to someone we determine to have a higher status, for example, regardless of how we perceive ourselves. In turn, we use our perception of other people’s voices to determine whether or not we trust a person. Generally, the human voice’s complex speech tone – or f0 – is around 100-120 Hz for men, and one octave higher for women. Yet, the vocal ranges we are able to phonate span extremes of high and lows in terms of pitch. Our vocal compass, which consists of all sounds from the lowest grunt to the highest squeak, is even more vast.

A concept studied for millenia

This is a phenomenon that has fascinated humans for millennia: in fact, the ancient Greeks are thought to be the first to have attempted to record the spectrum of the voice’s qualities, as perceived by the listener. Ongoing psychological studies of paralinguistics (or the social cues we find in voices) have since determined that continuous sounds at a lower frequency, like the sounds we use in meditation, are more comforting to us. 

In 2015, US company, Jobaline, went further: they analysed millions of audio files in order to develop software that can predict how a listener might feel upon hearing a certain voice. They identified how different elements of our voices interact – from pitch to intonation to energy – that give each voice its unique sound. And what will their findings contribute to? Predicting which voices will be best for a particular job, be that a hospital clerk, podcast host, or even how a computerised voice can create an emotional connection with listeners.

A single voice can trigger a spectrum of emotions and memories, place us in a moment or move us forward into another.”

Voices that transcend time

It’s why Solomon Linda’s Mbube, or The Lion Sleeps Tonight, first recorded with his a capella group The Evening Birds, is still rousing more than 80 years after the group – and many after them – stepped into a South African studio to record the song. And why certain songs since then have followed suit in being recorded and re-recorded, with varying impact. Listen to The Impressions’ People Get Ready, for example, first recorded in 1965. The gospel-inspired song was named the unofficial anthem of the Civil Rights Movement by Martin Luther King Jr, and was often used to get people marching or to calm and comfort them during political and social unrest. The original, a version by Bob Marley (who also recorded it in 1965),  by The Chambers Brothers or Aretha Franklin in 1968, all produce a stirring track, thick with voices and packed with quiet determination. Later versions from the likes of Bob Dylan or Rod Stewart arguably lack the vocal impact required by the meaningful message of the song. Simply put, it feels different.

The depths that we feel when we hear certain voices, whether they are speaking, singing, or communicating in other ways, is determined by the quality of how that sound is transmitted”

It’s because a certain voice can soothe us, move us, or be recognisable to us even from the first intake of breath. A single voice can trigger a spectrum of emotions and memories, place us in a moment or move us forward into another. A powerful tool: and one that advertisers and creatives alike have been acutely aware of since the salad days of radio broadcasting in the 1920s. In today’s world, with over 48 million podcast episodes already released, ever-more sophisticated and innovative ways to record, produce and present the human voice are being developed so that we can interact and learn from them at a deeper level. The gaming world has a similar story, too – a well-acted, well-recorded, perfectly pitched performance is crucial for the immersive experience a gamer can have with a character.

The depths that we feel when we hear certain voices, whether they are speaking, singing, or communicating in other ways, is determined by the quality of how that sound is transmitted, and how well it interacts with our surroundings. A crisp whisper in well-crafted headphones can feel incredibly intimate; a soaring soprano can produce a tear. In the right setting, a voice can be a comfort, a challenge, a warning. More universally, it’s a release – or, as Aretha cried back in 1968, it’s freedom.

Take Blur’s The Universal as an example. Say you stuck it on late at a party – a communal, cathartic singalong. Listen alone on headphones, however, and all that outward projection turns in on itself, the euphoric transforms into melancholy. You need both of those things; they’re the emotional kicker you get from music. Gabriel Roth, Daptone’s label boss, wasn’t being flippant when he said that music makes you feel good: listening to music actually produces neurochemicals that make you feel happy. And then there’s the mental health benefits you can get from your favourite record. Some songs take the weight off in the same way a good conversation does.

It felt stupid the first time I wrote “listen to music” on a to-do list back at the start of lockdown. But it was worth it. Now, placing my headphones over my ears I can be transported, even for just a few undisturbed moments. So what are we doing when we’re listening? We’re engaging in another world. The only thing we need to think about is how.

Chris Catchpole is a music journalist and writer. He was assistant editor at Q Magazine for ten years and currently writes for Rolling Stone, MOJO, The Guardian and Record Collector.

Chris Catchpole is a music journalist and writer.

Lexicon of Sound