How Coupa migrated from a self-hosted Redis to fully managed Amazon ElastiCache
This is a guest post by Ramesh Sencha, Lead Cloud Engineer at Coupa.
Coupa Software (NASDAQ: COUP) is a leader in Business Spend Management (BSM). Coupa enables companies around the world with the visibility and control they need to spend smarter and safer. Coupa offers value to over 2,000 customers across the globe as a software as a service (SaaS) product for BSM, hosted primarily on AWS.
In this post, we detail why Coupa chose to migrate their self-managed Redis workloads to Amazon ElastiCache for Redis. We discuss details such as the benefits achieved, the migration journey, and the architectures involved.
At the core of our enterprise, Coupa uses Redis for a variety of use cases. These include user caching, storing user sessions, and job queues (Resque, a Redis-supported library for creating background jobs).
Our core BSM application is tightly coupled with Redis—whenever an automated job is initiated, it calls to Redis and creates a queue. Caching improves query response time and relieves pressure from the underlying data stores by storing data in-memory for sub-millisecond response times.
We use Redis as a low-latency and high-concurrency data store to process fast-moving, ephemeral session data. We store a user’s visited pages and cookies for every session to determine when to initiate a session timeout. Before using Redis for this use case, we were sending all requests to our primary database, which caused heavy data reads and writes. Moving this use case to Redis reduced pressure on our database and improved response times for our customers.
To support personalized feature flags in our application, we use Redis as an in-memory message queue with Redis lists for high-performance writes at the head and reads from the tail. Redis lists also enable client commands that listen for the next message in the queue and commands to move messages between lists on the server.
The previous self-managed Redis deployment at Coupa was shared with other services, which made management and operations complex. We were regularly tasked with provisioning, software patching and setup, configuration, and ongoing backups. We also needed to implement and monitor errors and perform remediation. Plus, we had trouble scaling this self-hosted environment seamlessly. Finally, we had to monitor the replication performance between different nodes. This required additional resource investments in people, processes, and software just to operate the self-managed environment.
Our team of Coupa technical leads used the following criteria as evaluation parameters for a new Redis deployment:
- Alerting, monitoring, and integration with Amazon CloudWatch
- Version upgrade capabilities
- Centralized service deployments with integration into existing infrastructure such as code automation tools
- High availability (HA)
- Ease of maintenance and scaling
- Continued use of a single primary node and replica configuration (cluster mode disabled)
Why AWS and Amazon ElastiCache
Coupa was one of the pioneering SaaS-based products hosted entirely on AWS, right from its onset in 2006. Over the years, Coupa has favored AWS managed services like Amazon Elastic Container Service (Amazon ECS), Amazon Simple Notification Service (Amazon SNS), and Amazon Route 53. That led us to consider ElastiCache for Redis as a potential replacement for our self-hosted Redis. We evaluated multiple options available on the market, including other managed services and improvements to our self-hosted infrastructure. After this evaluation, we opted for ElastiCache for Redis because ElastiCache has the following features:
- Managed service benefits
- Simple pricing
- Open-source compatibility
- Drop-in replacement for existing workloads
- Reduced maintenance (multiple hosted Redis clusters can be replaced with one ElastiCache for Redis cluster)
- Ease of operations like upgrades and vertical scaling
Security was another major factor in our decision to adopt ElastiCache. At Coupa, it’s important that our application remains FedRAMP compliant. Instead of building out a separate, self-managed cache in house to ensure that our Redis workloads are compliant, we opted to move our Redis workloads to ElastiCache, which is already FedRAMP authorized.
Migrating to ElastiCache for Redis
Here at Coupa, we run different types of workloads, so we chose migration strategies based on the distinct requirements of each dependent application. Our goal with the migration was to create a seamless customer experience with minimal downtime. For example, if a user logs in a few minutes before we want to update our ecosystem, we must make sure they don’t get logged out during the update. To avoid issues like this, we developed two strategies:
- For one of our central applications, we migrated existing keys to the ElastiCache for Redis cluster
- In another core product, we ran two worker processes to migrate Redis keys from both new and previous Redis deployments
Our internal orchestration automates the process to onboard any application, updating the required configuration and integration for deployments. The following diagram shows the architecture of this migration.
With this strategy, what could have required an hour of downtime took only a fraction of a second—or milliseconds—of delay without any issues. We were able to complete the migration without impacting our customers’ experiences.
Furthermore, we’ve been able to cut operations time and effort significantly. Instead of monitoring our Redis workloads, we can trust the ElastiCache integration with CloudWatch to monitor and alert our application. Plus, we don’t need to build anything from scratch—when we need to start a new project, the ecosystem is already ready for us.
Coupa completed the migration to ElastiCache for Redis in Q1 of 2021, replacing self-managed Redis. Since then, we rolled out this update with limited availability, meaning our customers can use ElastiCache for Redis for caching in some of their use cases. Now we’re looking to further develop this capability: launching with general availability for all existing use cases and growing data size 5x. While we work on this, Coupa continues to explore adopting additional ElastiCache for Redis features, such as role-based access control, Global Datastore, and Redis cluster architecture for greater future scalability.
About the Authors
Ramesh Sencha works as Lead Cloud Software Engineer at Coupa. He is responsible for the infrastructure platform’s development and productionalization. He also drives security programs like zero vulnerability and live patching. He is passionate about containerization, continuous delivery and DevOps technologies.