Much of what you expect on your phone or smart speaker – voice control, point and click, high quality audio, even video calls – is coming to your TV experience. Enabled by similar technologies – and more.
Telephones graduated from tabletop to hands free, your television will soon transform from your wall and move closer into your hands. The complete transformation will soon be actualised, after scaling the remote stage. It’s gradually surmounting this remote stage as both television and remote becomes more intelligent.
The impetus to keep searching and developing these two devices (television and remote) lies in the attractive market opportunity, which abound in the sector. Two separate research reports predicted a 16 percent and 21 percent growth rate estimates of a Compound Annual Growth Rate (CAGR) for smart TVs. This five-year CAGR running through 2020 to 2025 however, focuses only on major providers such as Samsung, Sony, LG, Panasonic. It should be noted that the growth of cable and satellite providers are not included in the survey. The CAGR is the motivation manufacturers needed to make tv more intelligent.
TVs have been the electronic center of our homes since long before our new-found fascination with smart homes. For most of us they still are. They occupy a big footprint not just on our walls but also in our psyches and they’re not going to relinquish that hold without a fight. They continue to become smarter, in how we control them by voice or by pointing, in the audio experience and in video calling.
Voice activation is intuitive and has become very popular, whether in the TV itself or the remote control, to find channels, movies, change volume or pause streaming. Pointing with the remote is also big. Now that a TV has the brains of a computer or smartphone, we need to be able to select options on the screen more intelligently than today’s antiquated solutions (moving a cursor using arrow keys – anyone remember DOS?) Also, support for advanced codecs like Dolby MS12 is becoming essential to deliver a unified state-of-the-art audio experience from multiple input formats.
Effective voice control with a TV presents some of the same challenges seen by a smart speaker and some new ones. Common needs start with voice activity detection to enable power down modes, keyword detection for the wake-word/phrase (Alexa, hey google, something custom), followed by high quality speaker tracking and command recognition.
For TV-based solutions, voice pickup typically demands multiple microphones to support far-field speaker separation and acoustic echo cancellation (AEC). It also needs to lower the TV speakers’ volume when listening to a command and separate that echo from the person speaking the command.
Sound bars and other speakers for home theater setups add to the complexity. In-monitor, speaker positions are fixed so manufacturers can develop predictable ways to compensate for the audio they create, but other speakers are placed wherever you set them up, and the sound bar can be moved around. All of which require much more clever AEC compensation.
We all know about AI-based speech recognition solutions like Alexa for voice assistants. Still, these are consumer devices, very price-sensitive in very competitive markets. Many TV makers are going to want to add differentiation, maybe using Alexa in premium models, maybe relying on proprietary wake-word or phrase emphasizing their own markups.
Today’s audio content can come in a bewildering variety of compression formats and channels: from AAC and HE-AAC, through Dolby Digital and Dolby Digital Plus, and up to Dolby AC-4 and MPEG-H. Delivered over broadcast, file-based, over-the-top (of the internet), video-on-demand and through pay-tv operators.
Dolby designed the multi-stream decoder, MS12, to manage the audio tower of Babel through consistent delivery with advanced audio processing – bass enhancement, speaker tuning, virtual surround, and – volume leveling, which aims to minimize viewer annoyance by controlling loudness levels, shifts between programs, advertisements and channels. The Dolby MS12 codec is going to be essential in present day intelligent TV.
There’s more. 3D audio is getting hot. Not just surround sound, but mimicking sound sources moving around you, like a helicopter flying overhead, using technologies such as Dolby Atmos. This and other special effects will enhance gaming and movies by simulating more realistic experiences for players and viewers.
Although they are becoming more intelligent, remotes will probably be with us for a long time, but with substantially less buttons. They will be ideal pointers to select and click on option buttons on the screen as you’re lounging in your reclining armchair. They will be better for voice control in a noisy environment.
Voice control through a remote, has some of the same expectations as direct control of the TV but with different constraints. First, remotes are battery operated so need to be very power-sensitive. Active listening should be off until a voice is detected. To accomplish that, voice activity detection might be built into a micro-electromechanical system (MEMS) microphone or you might choose to manage ultra-low power in hardware or in software.
Second, voice pickup doesn’t need to have the same far-field support that a TV needs so it might be able to work with one microphone, though if you’re not holding it close you might need two with audio beam forming to be able to zoom in on the speaker. Noise cancellation in general is going to be challenging since the remote is located near the TV without the option to perform native AEC.
Published on EEWeb.
You might also like
More from Audio / Voice / Speech
Imagine you’re at the airport calling a friend. There are conversations going on all around you, planes taking off/landing, dozens …