In the past few years, artificial intelligence has emerged as one of the major powers to disrupt the way we are. Smart algorithms unburden businesses from time- and workforce-consuming routines, enabling them to make a huge leap towards increased agility and efficiency. Having a wealth of application areas, AI is well on its way to revolutionize object recognition and tracking, traffic control and video surveillance — the sky’s the limit.

With an eye to the capabilities that AI boasts today, tech mammoths like Alibaba and Huawei believe the tech will break far beyond what we think it can now. In their ambitious projects, AI covers entire cities with 24/7 video monitoring to detect accidents or traffic collisions and warn the police.


Smaller companies also go all out to keep up with the competition, furnishing their video solutions with AI functionality right at the hardware level, which makes them work with no internet connection needed.

Loads of raw disparate data stop being a pain thanks to the lightning-fast and unbiased analysis that AI provides. Huge sets of video footage translate into meaningful insights that improve decision making and help handle enterprise-scale workflows — and that isn’t the pinnacle of AI’s power.

Build Custom AI-Enabled Video Systems

With dangers like gun violence getting increasingly common worldwide, the challenge of building reliable public safety solutions is well up in the air. Quite frequent are the cases when police officers shoot suspects, mistaking harmless objects they hold for guns. To stay alert and efficiently spot crimes and accidents avoiding failures in critical situations, police stations and emergency companies can now harness AI-enabled video systems.


Last year, Oxagile’s maturity in the video domain complete with computer vision and machine learning engineering pool became a good match for several projects around smart video surveillance. To help the clients increase efficiency of their law enforcement solutions and keep hours of tiresome live monitoring hassle away, our team undertook development of computer vision and facial recognition functionality that can be easily implemented into the client’s video environment.

Building reliable software that pairs smart algorithms to static cameras stops being that much of an issue, as engineers worldwide have long started developing solutions to handle it. Yet, today the requirements to video processing systems grow more complex. With loads of videos made with smartphones or drone cameras, the new challenge is to create a robust AI-powered software applicable to on-the-go recording.

For one of our recent projects, we are building custom computer vision functionality that wrestles with the problem of detecting and tracking live video objects with a moving camera. This is a no mean feat for quite a number of reasons.

Camera’s motion implies vibration and changing speed that badly affect image quality with blurs, shadows, obstractions, and background clutters. Even a fixed outdoor camera cannot remain completely static and safe due to weather conditions or intentional damage.

Moreover, nearly each case requires a custom solution tailored to specific applications and business needs. Take, for instance, patrol officers that are to get equipped with body cameras, GPS, and drones under continuous threat of street shooting and riot. In this regard, last year a US police body camera manufacturer announced acquisition of an AI startup that develops object recognition software designed to train officers to better detect targets.

With a wealth of low-cost devices that feature advanced imaging functionality, the video processing domain just can’t get away with no feasible solution for live tracking with a moving camera. The demand for it grows all along with the competition between camera manufacturers. Yet, regardless of numerous moves towards a one-size-fits-all video tracking algorithm, no one has made it to devise one.

So, it’s not for nothing that Oxagile was eager to contribute to this challenging task. For some time now, our R&D team has been digging deep into tracking approaches and algorithms to deliver some meaningful insight on the domain.

Choose an Algorithm that Fits the Bill

On high level, the most common approach to video tracking is to spot the object of interest on the very first frame and keep tracking it all the way with the help of a chosen algorithm. What does it take to detect a moving object with a fixed camera, either with or without scene changes around? You’ll need solid background modelling and subtraction techniques to remove the current frame from the background and detect moving objects.

But when a camera isn’t static, this approach just won’t cut it, as inaccuracy in motion compensation hampers modelling background and foreground pixels. This made us execute extensive testing of the algorithms that are used most frequently for live object tracking.

By utilizing Kernelized Correlation Filter, Dlib, and Real-Time Recurrent Regression networks in different tracking environment, we’ve worked out a handful of insights on how to make the most of them. To cut a long story short, let’s get into the major takeaways.

Adaptive to the specifics of tracking a maneuvering object, Kernelized Correlation Filter is a reliable approach to handling noise, obstractions, and changes in target’s appearance. In a complex recording environment, it ensures better precision and is capable of keeping focus on the target for a long time.

A go-to algorithm for tracking objects that move in a predictable linear manner across an area with no large obstructions, Dlib yet fails to handle camera motion. Though Dlib can well catch up with, say, marathon race via a static camera, the algorithm lacks power to track it with a smartphone or a camera that’s mounted on drone.

Re3, or Real-Time Recurrent Regression networks for visual tracking of generic objects, is an approach that performs well under changing lightning conditions and frequent obstructions, with unsteady or shaking camera moving fast.

Only Re3 made it to regain tracking focus in cases when the target got completely out of vision for a short time. Yet, when it comes to processing objects of the same nature, chances are that Re3 can fail. For example, if a tracked person blends in with the crowd — the algorithm is likely to lose target.

Derive Instant Benefit from Video Processing Solutions

Our growing expertise in developing computer vision and machine learning solutions for global businesses makes us sure that pairing video solutions to reliable tracking algorithms can translate into tangible gains for your businesses right away. If you need a video system that can help you save man-hours, cut operating expenses, and stay ahead of competition — just contact us, and we’ll design a value-based roadmap for this.