The story in brief

The client’s portrait

A Fortune 500 corporation with thousands of technology patents and dozens of subsidiaries around the world.

Their quest

Developing a comprehensive computer vision-driven tool for the specific needs and requirements of the police force.

Our answer

An AI-driven computer vision solution that extracts all unique faces, objects, and vehicles from camera footage, provides specialized video editing tools, and produces detailed reports with metadata for every video frame.

Big picture of the solution's wins

80.7% person identification accuracy during real-time tracking and over 95% object detection accuracy with unstable footage

Smart logic that allows detecting up to 400% more object instances than most competitors on the market

Ultra-fast 30 fps HD-quality footage processing during video redaction

Ability to detect virtually any object of 20px in size or more in the provided footage

Cutting video processing time by an average of 98.67%, which results in a dramatic boost in police officer productivity

Intelligent mode identifies all the dubious video segments and gives actionable hints for 100% object detection accuracy

Business challenges dictating the project's execution

The project aimed at helping the police minimize the time and effort spent on filtering and manually correcting video evidence during investigations and court proceedings.
Considering such law enforcement intent of application, the task posed several challenges:

Unstable video handling

Ingestion and processing of video from body-worn and in-vehicle cameras, including live feeds, shaky footage, and footage filmed in adverse environmental conditions.

Precision in recognition

Utmost accuracy in face and object detection and identification.

Fast video search

Multi-faceted search in the video library, e.g., by race, gender, clothing, headgear, tattoos, behavior, and more.

Secure evidence editing

Reliable evidence redaction tool capable of blurring a certain face, object, or vehicle from every single frame of a video.

Assembling the right team to hit the targets

A limited timeframe added an extra layer of complexity, yet Oxagile curated an experienced, well-balanced team that achieved great levels of productivity to finish the project right on time:

  • A deep learning engineer
  • Computational mathematics experts
  • A data analysis expert
  • Systems integration specialists

Solution capabilities making it the core of investigations

The client received a powerful AI-driven computer vision platform designed to analyze camera footage to precisely detect, identify, and track faces, objects, and vehicles.

Automated objects’ blur

This tool allows users to have any face or object of their choice blurred in every frame of the video. This process is critical for witness protection in court and was previously done by hand, which consumed a lot of time and increased the risk of human error.

Fish-eye distortion correction

At the ingestion stage, the video file is decoded and presented as a set of frames. Then, advanced pre-processing algorithms are used to fix the fish-eye distortion of body-worn cameras.

Dynamic object recognition and tracking

The solution relies on neural networks to find required entities in every frame, detect people’s poses, and locate the vehicles’ license plates. With all objects of interest discovered, the system is able to track them across a group of frames.

Intelligent custom logic for 100% detection accuracy

The solution’s custom logic combines four types of video analysis. When used together, they allow for achieving close to 100% detection accuracy in a variety of situations.

  • When a previously captured entity suddenly disappears, the system locates it in the following frames and applies a linear approximation to all the frames in between.
  • Deep learning-based analysis is engaged whenever an identified face disappears, and the approximation analysis logic cannot recover it — like when a face passes the frame border.
  • Reverse analysis is applied to previous frames when a person or a vehicle approaches the camera, making them easier to identify.
  • Missed object analysis starts when the object is temporarily hidden from view. The system pushes a few future frames to quickly recapture it.

Reporting based on the user-provided target list of entities

  • Users can specify a list of entities they want to track in videos, enabling the system to generate detailed reports tailored to their needs.
  • The reports include extensive metadata for each entity, such as thumbnails of detected vehicles, license plate information, color descriptions, body styles, etc.
  • The system provides a comprehensive overview of the analysis, presenting generated thumbnails and insights into the accuracy of entity identification. It also highlights any misidentified or undetected entities, aiding users in understanding the reliability of the data.

Powerful functional modules

  • The C++ Windows application helps automate the detection, identification, and tracking of people, objects, and vehicles. This module is responsible for choosing the optimal thumbnails to be included in a report and supports multi-faceted entity search.
  • The custom JS player enables a host of video editing operations, including file import, frame by frame navigation, object blurring, zooming, resizing, and cutting.
  • The redaction system governs user management, video storage and search, and the video processing backend.

Essential for public safety, police investigations, court proceedings

After initial trials, the solution has demonstrated superior quality of detection and identification in comparison with other similar products intended for professional use. The system relies on highly intelligent logic that detects up to 400% more object instances than competition.

According to recent estimates, the solution delivers dramatic gains in video processing, boosting police officer productivity up to 60 times.

As of today, it’s the only end-to-end solution optimized to address a particular set of pain points that police work presents, and the only one to ensure seamless automation of video analysis.

Public safety, Electronics
Delivery Model
Scope-driven milestone-based development
Effort and Duration
4 months, 16 man-months
Python, C++, Python, TensorFlow, OpenCV, Windows Media Foundation, CUDA, cuDNN, IMFMediaEngine, Javascript, WebAssembly, Emscripten