NLP to Identify Spam and Negative Feedback

Solution overview

Automation of review processing and removal
NLP-driven model to identify spam and negativity
State-of-the-art ML techniques for text analysis
80% unwanted review prediction accuracy

When Oxagile met a client

This story started with two coincided goals. There was a company delighted to care about brands’ reputation on the market. And then came Oxagile’s team ready to put all brand management activities together in one marketing software solution.

Now there is no more headache with rank tracking, social media marketing, and review management, as business owners willingly delegate online reputation monitoring to the SaaS platform.

The story could have stopped here, but it continues. With more businesses coming to join over 1600 platform current users, new ideas about functional enhancements and service package additions are born.

Insult detected: how to eliminate the negative in no time

“Buried under reviews” is more than a metaphor for the client. The packages of comments about platform users’ products or services is our customer’s daily routine. The quicker an unwanted review is identified, the sooner it disappears from the web with no chances of damaging a business.

The reasons why reviews must be deleted are numerous, from being off-topic to spam. Still, the hardest part is a bunch of negative comments.

When the client turned to Oxagile, representing huge volumes of data and expressing the desire to reduce the time spent on manual review processing and the removal process, we already knew where to find a smart solution.

Natural language processing was key

In a variety of NLP models, we needed to find the one capable of defining a negative tone, which occasionally becomes tough even for humans. It was not enough for the NLP model we were searching for text analysis to merely detect single words indicating negation.

Negation clues are false friends here — we’re hunting ironic and sarcastic notes, while considering word ambiguity and multipolarity.

During the investigation stage, Oxagile’s deep learning specialists considered different options, checking out their ability to catch the nuances of meaning, and finally chose BERT — a machine learning model for natural language processing.

BERT stands for Bidirectional Encoder Representations from Transformers

While directional models learn the information provided either left-to-right or right-to-left, BERT reads the whole sequence of words at once (both from left-to-right and right-to-left). Bidirectionality becomes a weapon in understanding the context.

BERT is not a “blank page”: an already pre-trained BERT model favors the express training course

In spite of the model’s complex architecture and logic, we’re saving time on its training. BERT has already faced basic education and now is waiting for fine-tuning on our task.

How Oxagile trained BERT to make it decipher negative messages

Training step 1. Introducing training data to the model

During this step, we provided BERT with labeled data. The dataset showed a picture of the reviews that should disappear from the web.

An “adaptation task” for BERT was to absorb all right answers, analyze them, and identify common features for the reviews to be removed.

To make it go smoothly, Oxagile’s deep learning specialists prepared a training pipeline for correct data processing.

Training step 2. Model fine-tuning as a main part of the course

Originally, the BERT model had default hyperparameters such as Learning Rate, Optimizer, and Epochs. At the fine-tuning stage, our team configured all of them to reach an optimal trade-off between classification accuracy and speed.

Also, we defined a threshold to determine whether the reviews should be deleted or not.

After 2 months of the training pipeline creation, model training, and fine-tuning, BERT showed us an 80% prediction accuracy, having:

150,000 reviews: a total amount of data provided
80,000 reviews: used for training
8,000 reviews: used for testing

The NLP model’s now guarding brand reputation

Negative, inappropriate, irrelevant, or spam comments are no longer a nightmare for business owners, as the customer is quickly informed about those to be removed.

The solution is capable of identifying a toxic potential of testimonials after rapid processing of datasets that include comments from product or service consumers.

As more reviews are coming, mentioning products or services provided by the brand management platform users, the NLP model training process for the sake of higher prediction accuracy can be ongoing.

Industry

Professional Services

Delivery Model

Scope-driven milestone-based development

Domain

Artificial Intelligence

Effort and Duration

2 months, 334 man-hours

Technologies

PHP, BERT (Bidirectional Encoder Representations from Transformers), PyTorch, Kubernetes