Computer vision on the edge is a good start but it isn’t enough. It needs to deal with distorted images from wide-angle lenses, image stabilization, low-light conditions and other imaging functions we already see in more advanced applications. And it has to continue to do so at low cost and low power.
Consumers and businesses have always wanted more for less and they always will; this drives our markets and creates unbounded opportunities for technology innovators. In the systems world, more for less always means one thing – higher levels of integration. Pulling more functionality into a single chip reduces cost, making solutions more competitive in price-sensitive markets, and it also reduces power consumption, extending battery life and reducing maintenance or recharging hassle for mobile or remote applications.
We’re already comfortable with AI-fueled computer vision (CV) in many edge devices. We see multiple ADAS applications, for collision avoidance (both driving forward and reversing), lane departure detection and even more in autonomous and semi-autonomous vehicles. Drones no longer depend on us to navigate and to sense and avoid obstacles. AR and VR devices need intelligence combined with vision to sense our position and pose in order to ensure a quality experience with low latency (to minimize motion sickness). Body-mounted cameras for sports/action cameras are trending to using AI to capture interesting sequences to limit storage needs. And facility security applications similarly can capture only significant sequences, such as a person moving in the frame, also reducing storage requirements, but more importantly eliminating the need for constant human monitoring.
Why would we want to add image processing to these capabilities? Start with any case where you want a wide-angle view, perhaps even a 360o view. We’re already familiar with the slightly spooky overhead view option we can select on the forward/rear view screen in recent cars. How do they do that? By stitching together two or more camera views then correcting to produce some level of undistorted image. There are clearly varying levels of quality in that rendering – some of these images are significantly less convincing than others. A more important application uses the rear-view camera to detect obstacles when backing up and to automatically trigger braking.
This gets us closer to the importance of image processing. OEMs don’t want to use multiple rear-view cameras (which would be expensive) so instead use a single camera with a wide-angle lens. You’ve probably seen these fisheye lens images before – interesting but highly distorted. Correcting for that distortion (real-time adaptive de-warping) isn’t just for aesthetics; it’s also important before processing the image through neural networks for object detection. This same consideration can apply to a lot of security applications; if you can use a single fish-eye lens with correction rather than multiple cameras, the whole device is cheaper.
Image stabilization is another important application, especially for sports-oriented consumers. If you’re zipping down a ski-slope or biking down a rocky mountainside, you don’t want your head-mounted camera recoding every jerk and bounce. Most of us would want all jerky action smoothed out. Image processing can also help with contrast enhancement for low-light conditions and can also be useful in autofocus applications. While these functions require different types of processing elements, combining them in one efficient embedded processor allows for further performance and power optimizations within the core, and even more important can reduce cost for many price-sensitive applications.
This all sounds good but isn’t there a risk in such a tightly integrated solution becoming obsolete on the next product rev or worse yet on the next training update? After all, neural net algorithms famously evolve very quickly. How do you future-proof against these kinds of change? First, you definitely need the support of a graph compiler able to map from industry-standard learning platforms and optimize to your target networks with your selections for fixed-point widths per layer. And second you need to be able to redesign that neural network architecture as needed and be able to simulate your updates to characterize for power and performance, rerunning the compiler to remap the trained network to your updated target.
It would also be ideal to have an abundance of extra headroom for image processing, so you can leverage the latest advances as they become available. Check out how the CEVA NeuPro-S solution can make this possible.
Published on Embedded Vision Alliance
You might also like
More from Imaging & vision
Simultaneous Localization and Mapping (SLAM) describes the process by which a device, such as a robot, uses sensor data to build …