Cerved and Claranet Improve Data Quality and Lower Costs with Serverless AWS Machine-Learning

Executive Summary

With the help of AWS Partner offering consulting services Claranet, Italy’s leading business information provider and ratings agency Cerved reduced infrastructure costs and improved the accuracy of news article categorization for its media monitoring service by 25 percent using machine-learning models. It achieved this by switching from on-premises systems to a serverless AWS environment for machine-learning development.

Cerved and Claranet Improve Data Quality and Lower Costs with Severless AWS Machine-Learning

Italy’s leading business information provider Cerved has reduced infrastructure costs and improved the accuracy and quality of media monitoring data that it provides to clients by implementing a serverless environment on Amazon Web Services (AWS). It can now build, train, and monitor the performance of machine-learning models that automatically review and classify more than 20,000 daily corporate news articles for its media monitoring service.

As one of Europe’s major rating agencies, Cerved monitors and categorizes articles from hundreds of web sources to provide daily corporate and financial news. This service keeps its customers informed about markets and companies, which helps them to better manage risk and stay ahead of competitors. The service is provided direct to customers but is also embedded within Cerved’s other business information services for market intelligence, customer prospecting, and credit risk rating.

“The big difference in using AWS services compared to our previous on-premises system is that the AWS ecosystem gives us machine-learning models, integrates these processes into our wider system, and manages every part of our pipeline from training to deployment.”

- Daniele Tavolaro, Data Engineer, Cerved

Simplifying Machine-Learning Model Development

With the support of AWS Partner offering consulting services Claranet, Cerved was able to switch from its costly and inflexible on-premises, rules-based solution for tagging and categorizing news articles. It now uses an AWS serverless infrastructure that makes it easier for its data scientists and data engineers to develop, train, deploy, and maintain machine-learning models for real-time automated media monitoring in the production environment. 

Cerved wanted to improve accuracy, ease maintenance, and gain the ability to quickly extend functionality of its media monitoring service. Another key reason for the move to AWS was cost savings: switching to an operational expenditure (opex) approach for IT spending would eliminate the need for costly on-premises infrastructure that is underutilized outside of peak periods. “Having fully managed environments that are predefined simplifies development,” says Gabriele Sotto, Data Scientist at Cerved. “This approach enables us to be flexible and independent.” Upon starting the project in the middle of 2020, Cerved initially focused on building and implementing the new machine-learning models for three main components of its media monitoring service for Italian companies—that categorize business articles by types of business events, recognize companies with various economic and financial activities in Italy, and recognize geographic locations across Italy.

Building MLOps Skills with Claranet

One challenge that Cerved faced was that, while it had strong internal data scientist and data engineering skills, it was missing the DevOps skills for machine learning—or MLOps. This is where Claranet’s expertise and skills in DevOps and MLOps really helped support the project with advice on everything from API implementation to architecting the solution. Claranet helped Cerved design and automate the deployment of the developed machine-learning models through serverless infrastructure as a code. Claranet is also helping Cerved to plan and design the monitoring and retraining pipelines for the machine-learning models.

Claranet used a training operations approach to deliver a learning path to build Cerved’s internal AWS skills and expertise in these areas. “We provided some courses on big data and machine learning,” says Gianluigi Mucciolo, Senior Solutions Architect at Claranet. “We also provided on-the-job training for some activities where we started to implement a machine-learning pipeline in order to automatically release all the layers for AWS Lambda, and all the layers for libraries.”

Time Savings and Greater Accuracy

Since implementing the redesigned machine-learning models using the AWS serverless development environment, Cerved has achieved an average improvement of 25 percent in how accurately and precisely it automatically tags and categorizes news articles before these are sent to a team of editors for manual review. “This translates into time saved by the editorial team because they need to remove fewer articles that were tagged incorrectly,” says Divna Djordjevic, Data Scientist at Cerved. “And, in the long term, this turns into cost savings and also enables the editorial team to focus on more difficult tasks.”

Another major benefit of using AWS is the infrastructure cost savings, compared with the previous on-premises system. “Now, in the cloud with a serverless AWS solution, we can use the system only when we need it during the two-to-three-hour period when the news arrives,” says Daniele Tavolaro, Data Engineer at Cerved. “So we only pay for the actual use during that morning period.” All of this helps Cerved to deliver improved quality of data, which helps customers to make better decisions to protect against risk and ensure more sustainable growth. Based on the success of the project so far, Cerved plans to expand its use of the serverless MLOps environment to add more machine-learning models to other components of its media monitoring service. It also plans to expose these functionalities through APIs to offer new lines of products for customers.

A Purpose-Built Ecosystem for Machine-Learning

The main AWS services that Cerved uses in this project are AWS Lambda and Amazon Kinesis. It also uses Amazon Kinesis Data Streams for the different components of the media monitoring service that collects news articles from its many sources. Amazon SageMaker supports the machine-learning training tasks, in which there is a training pipeline for many independent binary classification models. These are then deployed as AWS Lambda layers. The different AWS Lambda functions then classify the news using multi-label classification, according to the different categories of news topics. The core part of the system also matches and recognizes companies and corporate entities based on custom trained neural networks and Cerved’s own largest Italian ecosystem of business information—that encompasses more than six million active Italian companies. Through another custom model for NER (named entity recognition) the system recognizes locations mentioned in the articles as it draws from external sources such as Italy’s Istituto Nazionale di Statistica.

“The big difference in using AWS services compared to our previous on-premises systems is that the AWS ecosystem gives us machine-learning models, integrates these processes into our wider system, and manages every part of our pipelines from training to deployment,” explains Tavolaro. “It’s very easy to do this. We are challenging the standard approach to MLOps today by using the serverless solution to give to our teams the best management of the cost and faster release of the artefact.”

Amazon Data Firehose then collects the information from the classification steps and ingests the results in an Amazon OpenSearch Service index. The results and classifications for news articles are then presented to Cerved’s editorial team for manual review through a custom user interface. “This is a work in progress,” says Djordjevic. “AWS ecosystem is making our system flexible and easier to maintain, as well as providing improved quality for our customers and creating cost savings for Cerved.”

Cerved

About Cerved

Cerved is a leading business information provider in Italy and one of the major rating agencies in Europe. Cerved helps businesses, banks, institutions, and individuals protect themselves from risk and achieve sustainable growth. Thanks to its unique repository of data and analytics, Cerved provides customers with services, advice, and digital platforms to manage risks and sustain data-driven growth.

About Claranet

Founded in 1996, Claranet Group is a leading European provider of managed network, hosting, and application services. The company works with customers of all sizes to design, build, and manage infrastructure and tooling for mission-critical operations.

Published December 2021