AWS Partner Network (APN) Blog

How Noventiq Leverages Amazon OpenSearch Service for Faster Search Results at Scale

By Achin Birpalia, Co-Founder – Cybex Exim Solutions
By Yogesh Kumar Sharma, Head of Application Modernization – Noventiq
By Rajdip Chaudhuri, Sr. Partner Solutions Architect, Data & Analytics – AWS

Noventiq-AWS-Partners-2023
Noventiq
Noventiq-APN-Blog-CTA-2023

Providing a better mobile experience with faster search results is a complicated undertaking, especially when large volumes of data are involved. The trade data sector, covering global exports and imports, handles large data volumes that need to be easily searched, filtered, and viewed by users in highly competitive markets.

Cybex Exim (Cybex) is a powerful data analysis software provider that aims to strengthen its market positioning by leveraging Amazon Web Services (AWS). Cybex was looking to bring a smooth and high-quality experience across its web and mobile platforms as the business grew and took on more prominent clients.

In this post, we will share how Noventiq worked with Cybex to design a solution using Amazon OpenSearch Service as a base. The work enabled Cybex to reduce costs by about 25% while scaling faster and increasing flexibility.

Cybex is now able to store more than 1.5TB of data, drive searches with minimum latency of about four seconds for basic search and 6-30 seconds on advanced search, and easily ingest, search, secure, aggregate, and visualize data.

Noventiq is an AWS Premier Tier Services Partner and Managed Service Provider (MSP) that provides a wide range of cloud services which enable migration, transformation, and modernization. Noventiq also holds the AWS Service Delivery specialization for Amazon OpenSearch Service.

Cutting Costs and Saving Time for Customers

In the previous setup, the Cybex application had powerful capabilities to accommodate large data volumes (~1.5 TB, ~4 billion records/documents) and enable fast views and downloads. Users could access data through the mobile app or web portal, and subscribers could use the dashboard to view and download reports and perform data processing and analyses.

However, Cybex was providing trade reports to customers using manual lookup and data entry each time a customer had a new data requirement, such as a new product or a new country.

Cybex wanted a cloud-based solution that could work on a subscription-based model, enabling customers to access trade reports anytime, anywhere, for any product and country.

Cybex also aimed to facilitate cost-cutting and time savings for clients and help subject matter experts (SMEs) derive actionable insights from its database easily and quickly. The goal was to have search results available in under 10 seconds using data pulled from billions of records.

A dedicated analytics section of graphs and charts using various statistical tools and techniques was also a key priority.

Generating Fast Results at Scale

To begin, Noventiq selected Amazon OpenSearch Service to change the way it delivered data. The team also devised a cloud-based solution in which OpenSearch could drive results from the database in a matter of seconds.

To ensure effective performance, the raw data had to be sorted and validated. Noventiq helped Cybex create folder nomenclature to support quick detection of data types. The data was then validated, saved, and uploaded to the web portal.

Noventiq also set up templates for index mapping in OpenSearch for the processed data using JSON-formatted documents to define search and filter criteria.

The application uses the high-level architecture for web and mobile apps shown below.

Umbrella-Infocare-OpenSearch-1

Figure 1 – High-level architecture.

Following is a breakdown of what the process entailed, including step-by-step instructions as well as the platforms and tools used:

  • Web application backend developed using Python Django framework and deployed on AWS Lambda.
  • HTTPS RESTful APIs published via Amazon API Gateway.
  • Frontend web application developed in Angular, deployed on Amazon Simple Storage Service (Amazon S3) and published via Amazon CloudFront.
  • OpenSearch facilitated search through varied data formats, advanced filters, and provided a personalized search experience.
  • OpenSearch enabled collecting and indexing remote data from different sources and making it visualization ready.
  • Mobile app was developed for Android and iOS platforms.
  • Mobile app connects with the backend application using APIs.
  • Amazon RDS for MySQL was used for data storage.

Bringing a Modern Solution to Life

Once the key architecture was in place and data was uploaded to the web portal, the team focused on getting the solution live. Below are the key steps involved in modernizing applications for faster search results with OpenSearch, from initial deployment to continued validation and processing.

Deployment Architecture

The structure is defined with trade type and country name to detect relevant data automatically. Each record is processed for various validations and gets saved in Amazon S3.

If an error is found, the file is rejected and saved in S3 for rechecking. Templates are created in OpenSearch, where processed data is then saved.

Scroll APIs are used to download large amounts of data within a short time for better performance.

Umbrella-Infocare-OpenSearch-2

Figure 2 – Deployment architecture.

Data Flow Architecture

In the data flow architecture, raw data is uploaded to Amazon S3. The AWS Lambda function is triggered to identify, validate, process, and upload data into the OpenSearch cluster. A copy of processed data is retained in S3 for future reference.

OpenSearch analyzers are used to enhance the search capability, and an aggregated layer is built on top of the processed data for aggregated and trend analysis.

The client app sends out the required search query, and once the data processing is in place, the team can create templates for index queries. The next step is mapping the index templates. Figure 3 depicts how data is ingested and indexed in OpenSearch.

Umbrella-Infocare-OpenSearch-3

Figure 3 – Data ingestion and index flow.

Optimizing the Solution for Current and Future Needs

Noventiq continued to enhance the data flow architecture to accommodate growing data sizes and changing needs. As the number of records and documents increased, the team reduced the number of shards per node and increased the size of individual nodes, per OpenSearch best practices.

At the time of optimization, OpenSearch vital parameters were as follows:

  • More than 4,000 indices.
  • Individual index size between 5 MB to 1.8 GB, with an average index size between 200 MB to 600 MB.
  • Average CPU utilization was 20%.
  • Java Virtual Machine (JVM) pressure was 32%.
  • Report index was being created on the basis of year and quarter numbers.
  • Indices were being created based on these factors:
    • Country and trade type
    • Index size
    • One shard per index was being created.

To manage these large data loads, Noventiq began with reindexing and replica indices and then divided report indices. The team then upgraded to an AWS Graviton processor for OpenSearch, which provided memory optimization and better throughput.

  • Reindexed data to reduce the number of shards for existing data.
  • Reduced the number of replicas to one for all indices.
  • Created data indices on a yearly basis and two separate indices for each country (one for import and another for export).
  • Created two indices for each data type.

Each replica is a full copy of an index and needs the same amount of disk space. Noventiq defined at least one replica per OpenSearch index to prevent data loss and improve search performance.

Each index was divided into two primary shards and one replica. As a result, four shards were created in total.

Conclusion

In this post, Noventiq demonstrated that leveraging Amazon OpenSearch Service has enabled Cybex Exim (Cybex) to greatly improve its search experience for customers. The post also provided a blueprint for others to improve data management.

Users now benefit from automation to detect data types and other categories, manage and monitor data for administrators, process data faster using index mapping, and process search results instantaneously.

By deploying cluster configuration with different data types, Noventiq helped Cybex significantly reduce costs, scale faster, and increase flexibility. This implementation has helped Cybex to drive searches with minimum latency of ~4 seconds for basic search and 6-30 seconds on advanced search over more than 1.5TB of data.

Through this solution, Noventiq has helped Cybex to provide a flexible and robust solution to the users to analyze trade reports 24×7. It’s also helped end users save time, cost, and stress because they can instantly access multiple reports of multiple countries and generate powerful analytical reports within less than 10 seconds. The results support a superior web and mobile app search experience that’s enabled Cybex to expand its customer base.

Reach out to Noventiq to learn more about its solutions.

.
Noventiq-APN-Blog-Connect-2023
.


Noventiq – AWS Partner Spotlight

Noventiq is an AWS Premier Tier Services Partner and MSP that provides a wide range of cloud services which enable migration, transformation, and modernization.

Contact Noventiq | Partner Overview