When ARM CPU cores were first validated though adoption by some of the premier names in computing including Apple, usage exploded especially for mobile applications. In retrospect the advantage was obvious – any device could be made much more flexible and feature-rich with an embedded processor. At the same time, that capability could be upgraded in software: A single hardware platform could drive multiple product releases through software-only upgrades.
These compute engines are very flexible and are perfect for many of the management and general-purpose compute tasks in our smartphones and other mobile products, but that generality comes with a drawback. There are certain operations which on a general-purpose computer would run far too slowly and consume far too much power to be practical. The modem in the wireless communication part of your smartphone was an early example. This has to process radio signals in real-time, dealing in each case not with the familiar digital words and bits used inside the computing part of the phone, but rather with a digitized version of the continuously varying analog signals used in radio transmission and reception.
Digital signal processors (DSPs) are designed for this kind of analysis. They have the built-in floating-point representation needed for digitized signals and they have strong support for the math functions needed in signal processing, such as multiple-accumulate (MAC) functions. They’re also optimized to process streaming data, rather than the more batch-oriented processing common in conventional compute, an essential feature in this case for handling continuous radio transmission and reception.
Audio processing needs share many of the same features seen in wireless signal processing. This application of DSPs became common in high-end audio applications such as equalization and range compression (for example Dolby compression), then increasingly in functions like the noise cancelling headphones which allow you to sleep undisturbed during your flight.
Then AI took off, initially only in datacenters but now more and more in mobile and other edge applications. Our cars can now detect pedestrians and potential collisions, and they can detect lane markings to guide steering in a basic form of self-driving. We can control our TVs or our smart speakers through voice commands, to find a song or a movie or lower or raise volume. We can even control the GoPro on our bike safety helmet through voice commands to start or stop taking pictures.
All of these capabilities depend on processing streaming data (voice) or images (camera still images) or possibly both (video), each in real-time or very close to real-time. Look first at audio processing. First you need to capture a high-quality streaming audio signal – through audio beamforming from multiple microphones, echo cancellation and noise suppression – all areas where there are already years of experience in DSP implementations.
Then you must recognize commands using a trained neural network, the basis of almost all of these AI techniques. These algorithms look very different from those you would run on a CPU; and while they can run on a CPU, they would be slow and run down the battery quickly. A better approach is to program the neural net on an architecture which offers a high level of parallelism, allowing many computations to run at the same time rather than serially as on a CPU. This is another core strength of a DSP – parallelism in computation.
You might wonder if, despite all these advantages, DSPs may be simply too complex to use to be adopted by anyone other than the specialists who have no choice but to use them. Certainly they’re not quite as simple to use as CPUs but the differences are not so big. You write C-code for both, though you need to be a little more thoughtful in the code you write for a DSP to get full advantage of performance.
As for widespread adoption, every radio on your phone – Bluetooth, Wi-Fi and cellular – uses one or more DSPs. Bluetooth earbuds use DSPs, for the Bluetooth and also the audio. Many smart speakers use a DSPs. Voice controlled remotes use DSPs. Home security systems use DSPs to detect anomalous movement on cameras and unusual sounds such as a dog barking or breaking glass. Smart sensors in your car use DSPs to detect forward and backup hazards and to detect lane markings.
Why not use GPUs for all these functions? GPUs are indeed very well-known especially for AI and have been widely used in data centers for neural net training. But they’re too big, too power hungry and too expensive for many edge applications. There’s a big push to move more AI functions to these devices for reasons of power, security and privacy. But these have to be very cost-effective solutions. In most cases there’s little appetite to add significantly to the cost of the total solution (car, TV, home security).
Which is why embedded DSPs are getting everywhere. You can add voice control, object detection, audio quality control and much more to your product at a low cost and at low power and still with the flexibility of software programmability. They won’t replace CPUs for management and general processing, but it looks like they’re taking over everything to do with smart audio and video/imaging.
This blog is the first in a series of three. Stay tuned for the next post: “When a DSP beats a hardware accelerator”.
Published on Embedded.com
You might also like
More from Audio / Voice / Speech
In order to improve functionality in next-generation devices, smart audio devices need to do more than just listen. Users are …