AWS Big Data Blog

How Octus achieved 85% infrastructure cost reduction with zero downtime migration to Amazon OpenSearch Service

As data volumes continue to grow exponentially, there is increasing pressure to optimize search infrastructure costs while maintaining the high performance and reliability that mission-critical workloads demand. Many companies find themselves managing complex, expensive search systems that require significant operational overhead and limit their ability to scale efficiently. The challenge becomes even more acute when organizations need to migrate between search systems, a process that traditionally involves substantial downtime, complex data synchronization, and significant impact on business operations. Enterprise applications cannot afford service interruptions that could impact customer experiences, business intelligence, or operational continuity. Migration strategies need to deliver cost optimization and operational improvements while maintaining zero downtime and facilitating complete data integrity throughout the transition process.

Founded in 2013, Octus, formerly Reorg, is the essential credit intelligence and data provider for the world’s leading buy side firms, investment banks, law firms and advisory firms. By surrounding unparalleled human expertise with proven technology, data and AI tools, Octus unlocks powerful truths that fuel decisive action across financial industries.

This post highlights how Octus migrated its Elasticsearch workloads running on Elastic Cloud to Amazon OpenSearch Service. The journey traces Octus’s shift from managing multiple systems to adopting a cost-efficient solution powered by OpenSearch Service. Along the way, we share the architecture choices and implementation strategies that made the migration successful. The result is uninterrupted service availability throughout migration, with improved performance and greater cost efficiency.

Strategic requirements

We identified several requirements that made Amazon OpenSearch Service the right choice for their migration:

  • Cost efficiency: The OpenSearch Service pricing model enabled us to optimize cloud spend without compromising performance.
  • Responsive support: AWS provided dependable, high-quality support to accelerate issue resolution and instill confidence.
  • Consistent reliability: OpenSearch Service provides an SLA up to 99.99% offering the reliability required for Octus’s mission-critical workloads.
  • Seamless migration with no query downtime: Migration Assistant for Amazon OpenSearch Service provided Octus with a migration path while maintaining uninterrupted query availability during the migration, facilitating business continuity.
  • Operational simplification: Consolidating onto AWS reduced infrastructure complexity while maintaining high security standards.

Solution overview

The Migration Assistant for Amazon OpenSearch Service provides a suite of tools to aid in Elasticsearch to OpenSearch Service migrations. Octus use the following capabilities for their migration:

  • Metadata migration: The tool enabled Octus to migrate dozens of indices with diverse mappings and settings. When a backward incompatibility was identified with timestamp metadata, a custom JavaScript transformation, integrated directly into the Migration Assistant tooling, was applied to automatically adjust the mappings across the indices and facilitate compatibility.
  • Historical data migration: Octus used Reindex-from-Snapshot to migrate the historical documents from a point-in-time snapshot of the source cluster, scaling this process without impacting the source cluster since the snapshot was stored in Amazon Simple Storage Service (Amazon S3). Reindex-from-Snapshot also enabled Octus to adjust the sharding scheme during migration, helping to optimize cluster performance on the target.
  • Live Traffic Replay: Once backfill was complete, Octus used Migration Assistant’s Traffic Replayer to send the captured live traffic (from the Traffic Capture Proxy) to the target cluster with required request transformations for OpenSearch Service compatibility, resulting in the target cluster containing the documents from the source cluster with updates being performed in real time.

The following diagram illustrates the implementation architecture diagram for this migration.


Figure 1 – Migration Assistant architecture with migration steps

For more information about the Migration Assistant for Amazon OpenSearch Service, visit the AWS Solutions home page.

Each node in the diagram correlates to the following steps in the migration process:

  1. Client traffic is directed to the existing cluster.
  2. An Application Load Balancer with capture proxies relays traffic to a source while replicating data to Amazon Managed Streaming for Apache Kafka (Amazon MSK).
  3. Using the migration console, a point-in-time snapshot is taken. Once the snapshot completes, the Metadata Migration Tool is used to establish indexes, templates, component templates, and aliases on the target cluster. With continuous traffic capture in place, Reindex-from-Snapshot, migrates data from the source.
  4. Once Reindex-from-Snapshot is complete, captured traffic is replayed from Amazon Managed Streaming for Apache Kafka (Amazon MSK) to the target cluster by Traffic Replayer.
  5. Performance and behavior of traffic sent to the source and target clusters are compared by reviewing logs and metrics.
  6. After confirming that the target cluster’s functionality meets expectations, clients are redirected to the new target.

Complete migration and optimization journey

Octus’s migration from Elastic Cloud to Amazon OpenSearch Service encompassed both the core migration effort and subsequent optimization phases. The goal was to successfully migrate the search infrastructure, applications, and data from Elastic Cloud to a new OpenSearch Service domain with minimal disruption, while continuously optimizing performance and costs based on real-world usage data.

Octus used their in-house custom infrastructure frameworks (their internal tooling for infrastructure automation) to build, deploy and monitor the target OpenSearch Service 1.3 domain, establishing a solid foundation for the migration. This approach used familiar internal processes while moving to the fully managed AWS service. Refer to AWS documentation to implement security best practices when using OpenSearch Service.

Pre-migration optimization

Prior to initiating the migration, Octus conducted optimization activities on the source Elasticsearch cluster to streamline the migration process. This included removing unused indexes that had accumulated over time and removing large documents that would unnecessarily extend migration duration and increase storage transfer costs. These preparatory steps significantly reduced the data volume requiring migration and minimized the overall migration complexity, enabling more efficient use of the Migration Assistant tools.

Technical constraints and version considerations

The migration involved specific version compatibility challenges that influenced the technical approach. The source Elasticsearch cluster was running version 7.17, and the Python client applications were also constrained to Elasticsearch 7.17 compatibility. To support the transition, the team used Reindex-from-Snapshot, which enables cross-system migrations by reindexing data from existing snapshots into a new OpenSearch Service cluster. RFS also rewrites indices created on older versions of Lucene, simplifying future upgrades to the latest version of OpenSearch Service. While evaluating a move to OpenSearch 1 or 2, Octus selected OpenSearch 1.3 as the target to minimize client-side changes and reduce migration complexity, while positioning themselves for simpler upgrades later.

The version selection particularly impacted the R application environment, as R language (an open-source programming language for statistical computing and data analysis) lacked native OpenSearch 1.3 client support. This constraint required Octus to develop a custom client solution using the ropensci/elastic library to integrate with the new OpenSearch Service domain. The Python environment presented similar challenges, where the Elasticsearch 7.17 client constraints necessitated careful consideration of the migration approach. These client compatibility concerns were among the factors that influenced the choice of Migration Assistant tools over traditional snapshot-based methods, as the Migration Assistant provided better support for managing version-specific client interactions during the transition.

Looking forward, Octus plans to upgrade to newer OpenSearch versions as their application stack evolves and client library support matures, so that they can leverage the latest features and performance improvements while maintaining the stability achieved through this migration.

Application modernization across multiple languages

The application changes represented a significant technical undertaking across multiple programming environments:

  • Legacy PHP systems (5.6 and Laravel 4.2): Octus handled mapping type deprecation on OpenSearch requests as specifying these mapping types are not supported, while continuing to use the elasticsearch connector library with username/password authentication.
  • Modern PHP applications (8.1 and Laravel 9): These underwent more comprehensive changes, replacing the elasticsearch/elasticsearch library with the opensearch-project/opensearch-php client and leveraging IAM authentication to connect to the clusters.
  • Python environment: Applications spanning versions 3.8, 3.10, 3.11, and 3.13 with Django frameworks 2.1, 3.2, and 5.2 required replacing the elasticsearch library with opensearch-py and transitioning to IAM authentication.
  • R applications: For R 4.5.1 applications, Octus utilized a custom library ropensci/elastic to facilitate compatibility.

Traffic routing and enhanced monitoring

To facilitate the migration, Octus redirected their existing clients to route requests to the source cluster through Migration Assistant’s Traffic Capture Proxy, migrating the data from live traffic to their target cluster.

The monitoring infrastructure underwent significant enhancement during this process. Octus’s observability infrastructure monitors the overall health of OpenSearch Service clusters which includes cluster manager and data nodes, network, data storage, security and IAM access. It also monitors the indexing and search performance of their applications. This alleviated the need for a separate monitoring cluster as logs and metrics were shipped directly to Datadog, significantly improving observability. The Datadog monitors were defined using Infrastructure-as-Code and integrated seamlessly into their infrastructure frameworks.

Cutover and initial results

The Site Reliability Engineering team meticulously planned the release, achieving a successful migration from Elasticsearch to OpenSearch Service and cutover of the Elasticsearch client to the OpenSearch Service clients with no downtime for the system application and zero data loss. The initial migration phase resulted in a 52% cost reduction while achieving operational benefits including zero downtime for the system app, no data loss, full Infrastructure-as-Code implementation for infrastructure and monitoring, and enhanced observability.

Post-migration optimization

Following the migration, Octus conducted comprehensive optimization based on operational data from production and other environments in the new OpenSearch Service setup. This real-world usage data provided valuable insights into actual resource consumption, enabling informed decisions regarding further cluster resizing.

Through usage metric analysis and strategic resizing, Octus aligned cluster size more precisely with operational needs, facilitating continued performance while minimizing expenditure. This optimization phase delivered an additional 33% cost reduction compared to the original Elastic Cloud costs, bringing the total reduction to 85% while maintaining consistent and optimal performance.

Operational monitoring

Octus uses Datadog to monitor both search and indexing latency providing real-time visibility into Amazon OpenSearch Service cluster performance. The following screenshot showcases how custom Datadog dashboards provide a live view of the OpenSearch Service clusters. This visualization offers both a high-level overview and detailed insights into the ingestion process, helping us understand the storage and document count. The bottom half of the dashboard presents a time-series view of individual node health and performance metrics like read and write latency, throughput and IOPS.


Figure 2 – DataDog dashboards

Migration observability

Migration Assistant for Amazon OpenSearch Service provides several dashboards to observe and validate the progress of a migration. By using these observability features customers can track both backfill and live capture and replay progress, facilitating confidence before switching production workloads to the target cluster.The following graphs are an example from Octus’s migration, where approximately 4TB of data was migrated in about 9 hours (from 08:00 to 17:00).


Figure 3 – Backfill progress by disk usage


Figure 4 – Backfill progress by searchable documents

Once the backfill is complete, the captured traffic is replayed to synchronize ongoing activity between the source and target clusters.

At the time the backfill finished (around 17:00), the target cluster was approximately 467 minutes behind the source. The replay process rapidly reduced this lag by processing captured traffic at a faster rate than it was originally ingested at the source.


Figure 5 – Replay lag after backfill completion

When the lag time reached 0, the target cluster was fully in sync and production traffic could safely be rerouted. Octus chose to observe replayed traffic on the target for several days before making the final switchover.

Achieving excellence

Octus’s migration to Amazon OpenSearch Service has yielded remarkable results:

  • Scalability – Octus has almost doubled the number of documents available for Q&A across three environments in days instead of weeks. Their use of Amazon Elastic Container Service (Amazon ECS) with AWS Fargate with auto scaling rules and controls gives them elastic scalability for their services during peak usage hours.
  • Cost reduction – By moving away from Elastic Cloud to OpenSearch Service, Octus’s monthly infrastructure costs are now 85% lower.
  • Enhanced search performance – Octus maintained consistent response times throughout the migration with no negative impact on latency, while achieving a 20% improvement in query throughput and overall search performance.
  • Zero downtime – Octus experienced zero downtime during migration and 100% uptime overall for the whole application.
  • Reduced operational overhead – Post-migration, Octus’s DevOps and SRE teams see 30% less maintenance burden and overheads. Supporting SOC2 compliance is also straightforward now that they’re using one system.
  • Accelerated timeline delivery – The entire migration was completed ahead of schedule, moving from planning to full completion in under one quarter.

“Moving from Elastic Cloud to Amazon OpenSearch Service was a key component of our broader strategy to minimize third-party dependencies and strengthen the reliability of Octus’ system infrastructure. Migration Assistant for Amazon OpenSearch Service enabled us to execute a seamless transition with zero data loss and virtually no downtime for our users.” – Vishal Saxena, CTO, Octus

Conclusion

In this post, we showed you how Octus successfully migrated their Elasticsearch workloads from Elastic Cloud to Amazon OpenSearch Service using the Migration Assistant for OpenSearch Service, achieving zero downtime and significant operational improvements.

The Migration Assistant for OpenSearch Service supported this complex migration through its comprehensive suite of tools. The Metadata Migration capability migrated dozens of indices with diverse mappings and settings, with custom JavaScript transformations handling backward incompatibilities. Reindex-from-Snapshot migrated the historical documents from point-in-time snapshots without impacting the source cluster, while also optimizing the sharding scheme for improved performance. Live Traffic Replay made sure the target cluster remained synchronized with real-time updates throughout the migration process.

The migration delivered substantial results across the dimensions. Octus achieved an 85% reduction in monthly infrastructure costs while nearly doubling the number of documents available for search across three environments. Search performance improved by 20% in query throughput with consistent response times and no negative impact on latency. The migration maintained zero downtime and 100% uptime for the entire application, with DevOps and SRE teams experiencing 30% less maintenance burden and operational overhead. The entire migration was completed ahead of schedule in under one quarter.

To learn more about the Migration Assistant for OpenSearch Service and how it can help you achieve similar results, visit the AWS Solutions home page.

Visit Octus to learn how we deliver rigorously verified intelligence at speed and create a complete picture for professionals across the entire credit lifecycle. Follow Octus on LinkedIn and X.


About the Authors

Harmandeep Sethi

Harmandeep Sethi

Harmandeep is Head of SRE Engineering and Infrastructure Frameworks at Octus. with nearly 10 years of experience leading high-performing teams in the implementation of large-scale systems. He has played a pivotal role in transforming and modernizing Octus’s Search Engine infrastructure and services by driving best practices in observability, resilience engineering, and the automation of operational processes through Infrastructure Frameworks.

Serhii Shevchenko

Serhii Shevchenko

Serhii is a Site Reliability Engineer at Octus. With 9 years of combined experience in software development and site reliability engineering, his expertise focuses on enhancing system reliability and performance. He was a key developer on the application side for the company’s critical migration from Elasticsearch Cloud to AWS OpenSearch. His planning was instrumental in executing the transition with zero client-facing downtime.

Govind Bajaj

Govind Bajaj

Govind is a Senior Site Reliability Engineer at Octus, specializing in architecting and implementing scalable infrastructure that supports high-performing engineering teams and critical systems. With over 8 years of experience, he excels at breaking down complex problems and turning them into practical, well-designed solutions, with a strong focus on building secure, observable, and resilient platforms.

Virendra Shinde

Virendra Shinde

Virendra is the Head of Platform at Octus, where he oversees cloud infrastructure, site reliability, and the core frameworks that power the Octus product suite. Before joining Octus, he spent two years at Grayscale Investments building an investor portal and data APIs from the ground up. Prior to that, he spent eight years at Blackstone leading multiple development teams. He holds a Master’s degree in Information Management from the University of Maryland.

Brian Presley

Brian Presley

Brian is a Software Development Manager at OpenSearch, leading teams behind OpenSearch Migrations and OpenSearch Serverless to build scalable, high-impact search and analytics solutions.

Andre Kurait

Andre Kurait

Andre is a Software Development Engineer II at AWS, based in Austin, Texas. He is currently working on Migration Assistant for Amazon OpenSearch Service. Prior to joining Amazon OpenSearch, Andre worked within Amazon Health Services. In his free time, Andre enjoys traveling, cooking, and playing in his church sport leagues. Andre holds Bachelor of the Science degrees from the University of Kansas in Computer Science and Mathematics.

Vaibhav Sabharwal

Vaibhav Sabharwal

Vaibhav is a Senior Solutions Architect at AWS based out of New York. He is passionate about learning new cloud technologies and assisting customers in building cloud adoption strategies, designing innovative solutions, and driving operational excellence. As a member of the Financial Services and Storage Technical Field Communities at AWS, he actively contributes to the collaborative efforts within the industry.