At this point, you’ve probably heard a bit about spatial audio from all over, but what is spatial audio exactly? What is this big feature that Google, Apple, and Samsung are all including in their products? And is it the same thing as Dolby Atmos? This post will go into detail about what spatial audio is and why we should care. However, if watching something is more your speed, check out our webinar on the topic.
History of Audio
When we’re not using headphones or earbuds, we’re listening to sound in 3 dimensions. It comes from every direction (above, below, to the right, left, behind, in front, and everything in between), and our brains can decipher these sounds to determine direction.
Technology to emulate this natural experience anywhere has been a pursuit for well over a century. In 1881, a French engineer named Clement Ader invented the Théâtrophone, which used 80 telephone transmitters connected across the stage of the Paris Opera. These transmitters created a binaural stereoscopic sound (a method of recording sound with two microphones arranged to replicate the 3D stereo sound one perceives in real life). With this, appreciators of the opera could listen from as far away as two kilometers.
In World War I and the early part of World War II, acoustics played a large part in determining the direction of aircraft. Each country had their own unique ways of picking up and amplifying noise to help hear the plane engines and determine their direction. Looking back they look a bit comical, but it’s clear audio was a key war technology.
About 30 years later, in 1972, Neumann released their first commercial binaural recording system, allowing the replication of spatial sound to be simplified and consistent across various applications. Technology and methods have since improved, including a newer technique of using arrays instead of just two distinct microphones to get a more detailed recording of a given space.
Today, advanced audio techniques are being integrated in all sorts of audio applications (from music to gaming) on all sorts of devices like sound bars, headphones, TWS earbuds, automobiles, and XR devices.
Spatial Audio Family Tree
The way we listen to audio has also changed through the years. It’s started with mono output like you’d hear from a radio. All of the sound came from one source. But then sound evolved into using more speakers to give listeners a more engaging and encompassing sound experience.
The earliest form of this was stereo sound, with two speakers, then into quad sound with 4 speakers. This advanced to surround sound, with 5.1, 7.1 (where there are 5 and 7 speakers, respectively, and a single subwoofer for lower frequencies), and large speaker arrays (way more than 7) for more spatial output.
While 5.1 and 7.1 surround sound systems emulated sound around you, it was only really in a single plane around you as those speakers surround you at about the same height. Dolby Atmos has come into the audio space to give audio cues of sounds above and below you, creating a more immersive experience.
So what IS Spatial Audio?
What exactly is this big feature that Google, Apple, and Samsung are all including in their products? And how is it different from Dolby Atmos? You may have noticed that I never called the prior sound experiences spatial audio. Though one would think that anything with two or more speakers would justify being called “spatial” by the dictionary definition of spatial (relating to or occupying space). And… I’d agree with you.
However, in the industry, spatial audio refers to a very specific type of experience. You may also hear it referred to as 3D audio or, in Samsung’s case, 360 audio. The technical term for these types of experiences is head tracked binaural audio. Let’s break it down into its pieces. Binaural audio is audio that you get when you record sound with two mics at the ear location on a dummy head. Just like the Neumann head you see below.
Doing so gives you audio that matches what you’d hear yourself since the microphones are in the same locations as typical ears would be.
To clarify it further, take a look at this wonderful illustration by Rit Rajarshi:
Audio Cues from Binaural Audio
Sound from the bird to the left travels towards the listener’s head. But since one ear is further than the other on the human head, the sound reaches each ear at a different time. The difference in timing is processed and understood by our brains to give us a relative position of the sound. So putting these components together gives you a “map” of sound that puts you in an immersive space. Which is really cool, but it’s still not quite as realistic as it could be. That’s where the head tracking component comes in.
In the real world, objects are stationary around you, and the sounds come from those locations relative to your head (and subsequently your ear) position. Say you hear the roar of a sports arena behind you. If you turn your head to the left or right, you’re able to focus your hearing towards it, but it doesn’t change its position.
Similarly, sounds should stay in place in the world even after motion. The gif below shows the difference between binaural audio and head tracked binaural audio. When you rotate your head, the world should not rotate with you, it should stay in place.
Why head tracking matters: Binaural Audio vs. Head Tracked Binaural Audio
These pieces combined is what makes for a truly immersive experience and is what we in the industry call spatial audio.
Another piece to this experience is head related transfer functions (HRTFs). They’re algorithms designed to determine how sounds bounce, scatter, and diffract as they move when they arrive and travel towards the ear canal. It also takes into account, interaural distance. This in short determines how sound is affected by any given unique head shape and adjusts it so that audio cues come through as realistically as possible.
By combining all of these components, the binaural recordings, the HRTFs, and the head tracking, a complete, comprehensive, fully immersive audio experience can be achieved that makes sound experiences feel like you’re really there.
If you’re in the field and want a head start on your next spatial audio product, CEVA can provide a complete audio solution to fit your needs. If you’re interested in learning more directly from the horse’s mouth, feel free to contact us for more information.
You might also like
More from Audio / Voice / Speech
Enhancing Audio Quality with Environmental Noise Cancellation in Sound Processing – Part 1 – Introduction
In today's fast-paced world, clear and effective communication is more important than ever. With the widespread use of telephones, video …
Voice Control for Low Power Edge Devices
In this blog, we will discuss the why and how of voice control deployment on low power and resource constrained …