Solution highlights

  • AI solution for handwritten text recognition
  • Intelligent cropping tools
  • Optical character recognition by neural networks
  • 99.3% cropping processing accuracy
  • Real-world data, synthetic data, and a mixed approach for neural network training

How to make printed ads ready for data mining

Their offers were everywhere: on the streets, in the supermarkets, even on the poles. Some seemed risky — a business can barely afford giving such discounts as a rule. Others pretended to please a consumer, but the fine print at the bottom of the advertisement shed the light on the deal.

A competitive retail business gets more insights from printed ads than an average megapolis citizen chasing the most attractive offer ever. But where is a single source of all promotions and loyalty programs to make up a winning pricing policy?

Our client guaranteed that not a single ad would be missed, gathering all info about products and types of offers in a unified database. Led by the desire to cut costs on manual data processing and automate at least 80% of data handling operations, the client reached out to Oxagile for an AI solution.

Computer vision algorithms address the challenge of OCR

The solution should be ready for countless advertising pages — each included a bunch of various offer types. With printed coupons at its disposal, the system was expected to give confident answers to the following consecutive questions:

How many ad blocks are there on one ad page?

A neural network detects all individual ads on the page and a cropping tool allows dividing one page into separate pieces of advertising.

What’s each offer about?

After cropping, the solution dives into the ad details. Text detection and recognition algorithms identify all key messages, including a brand, price, volume, type of the product, etc.

What’s the offer type?

It’s time for a neural network to classify the offers according to their type. Where are the “Buy one, get one” offers? What about the “X for…” option? The system finds them all and generates the Offer Table.

The solution was expected to process about 20,000 ad blocks daily. Given the fact that in some cases, only manual methods succeeded to identify the offer type, automation should make up at least 80% of all processing cases, with a text recognition accuracy close to 100%.

The way from cropping to classifying ads

Starting with researching and looking for the state-of-the-art neural network models that make up our system, our team then initiated a training part, which included:

  • Dataset preparation
  • Model training and customization (e.g., increasing data processing speed and accuracy, fine-tuning hyperparameters, and configuring the number of layers)
  • Model enhancements with the new training data arriving

Cropping: ad pages transform into ad blocks with an accuracy of 99.3%

Being the most accurate real-time neural network for object detection, the YOLO family (v4 and v5) showed bright results when it came to cropping — 99.3% accuracy.

Text recognition: how to break down the nuances of each ad block

To extract the info about product details and offer type, we used CRAFT, spatial transformer, bidirectional long-short term memory, connectionist temporal classification, attention module, SynthText and MJSynth, among others.

CRAFT training: real-world and synthetic data combination

More to come: NLP for unlocking the model’s full potential

A delightful offer from a grocery store, sudden liquidation of goods, or seasonal discounts motivate shopping addicts to break their money boxes, but an abundance of ads is also a signal for other retailers that there are valuable sources coming to be analyzed for preparing an effective data-driven strategy.

The only thing left is to process huge amounts of printed ads, which is carefully guided by our customer.

On top of the high accuracy achieved by Oxagile’s AI-powered solution, our team considers involving an NLP model for reaching better results in recognizing offer types. Instead of the smart logic helping us show almost 93% accuracy of offer type recognition, we can also use machine learning techniques for sentiment analysis.

Industry
Professional Services
Delivery Model
Scope-driven milestone-based development
Effort and Duration
3 months, 9 man-months
Technologies
PyTorch, ONNX, Darknet, YOLOv4, YOLOv5, Character-Region Awareness for Text Detection (CRAFT), Spatial Transformer Network, Bidirectional LSTM, Attention module, CTC-classifier, SynthText, MJSynth