AWS Database Blog
MaiCoin case study: Blue/green upgrade from Amazon ElastiCache Redis to Valkey
This is a guest post by Charles Hsiao, Site reliability engineer (SRE) at MaiCoin, in partnership with AWS.
MaiCoin is a leading cryptocurrency exchange and brokerage platform in Taiwan. It supports major assets such as Bitcoin (BTC), Ethereum (ETH), Litecoin (LTC), Tether (USDT), and USD Coin (USDC), among others. The MaiCoin platform previously ran on a set of Amazon ElastiCache deployment clusters on Redis OSS.
Valkey is an open source, high performance, key-value datastore stewarded by Linux Foundation backed by more than 40 companies. Valkey is a drop-in replacement of Redis OSS, developed by long-standing Redis OSS contributors and maintainers, and has seen rapid adoption since project inception in March 2024. Amazon Web Services (AWS) is actively contributing to the Valkey project. With growing market interest in Valkey, organizations including MaiCoin are seeking reliable migration methodologies and best practices.
Amazon ElastiCache for Valkey provides a fully managed service that simplifies this transition, offering automatic failover, built-in security, and seamless AWS integration. In this post, we deep dive into MaiCoin’s business use cases and their self-managed blue/green deployment strategies that allow migration from Amazon ElastiCache for Redis to ElastiCache for Valkey with minimal downtime. The strategies provide the benefits of the fully managed capabilities of ElastiCache at a 20–33% lower cost.
Overview of current ElastiCache for Valkey managed migration method
Amazon ElastiCache currently provides a managed in-place upgrade from Redis OSS to Valkey. In this upgrade approach, a new set of Valkey nodes gets created in the same cluster, which then replicate data from the current primary nodes running on Redis OSS. After the data replication process is complete, the service performs a failover, which promotes the new Valkey replica into primary, updates the DNS endpoint of the ElastiCache cluster to point to the new Valkey nodes, and deletes the old Redis OSS nodes from the cluster.
You can use this method to upgrade the engine from Redis OSS to Valkey without needing to manage the upgrade process from the application side. As part of this upgrade, your application connections will automatically get redirected to the new nodes running Valkey.
MaiCoin case study: The real-world challenges of Valkey migration in the crypto industry
In crypto trading and payments, millisecond-level latency and continuous availability aren’t optional—they’re existential. As MaiCoin transitioned core caching workloads from Amazon ElastiCache for Redis OSS to ElastiCache for Valkey, we faced challenges rooted in market dynamics and reliability expectations in a distributed system. The following section summarizes our key business constraints, technical trade-offs, validation approach, and business outcomes.
MaiCoin faced several business constraints in migrating from ElastiCache for Redis OSS to ElastiCache for Valkey. As mentioned above, migration is harder in crypto due to its demand for availability and millisecond latency. Additional constraints include:
- Around-the-clock market nature of crypto trading – Unlike traditional stock markets that close daily or on weekends, the crypto market operates continuously. Even with scheduled maintenance windows, migrations must account for live trading activity to avoid disrupting user trust.
- Tight latency budgets – P95 and P99 latency targets measured in low single digit milliseconds leave minimal room for added latency due to TLS connection handshake, slot rebalancing, or cross-AZ traffic routing.
- Burst tolerance – Market volatility can trigger sudden surges in throughput and publish-subscribe (pub-sub) fan-out. The cache layer must remain stable not only under average load but also under peak load.
- Regulatory and audit readiness – Strong traceability of configuration changes, reproducible infrastructure, and clear data retention are required to meet internal governance and regulatory standards.
MaiCoin evaluated two migration strategies, service in-place upgrade and self-managed blue/green deployment, each with distinct trade-offs.
The service in-place upgrade offered simplicity: a single cluster endpoint, minimal application changes, and a quick cutover. However, it provided no live validation window and lacked the ability to roll back, which increased recovery complexity.
In contrast, the blue/green deployment strategy introduced higher operational overhead and would temporarily double the cost of operation. But it provided clear advantages in control, safety, and observability. It enabled isolated validation, progressive traffic shifting, and a reliable rollback path if anomalies occurred after cutover.
Given the critical nature of trading and wallet workloads to stay available around the clock with tight latency budgets and being compliant with regulatory audits, we chose blue/green deployment. This approach meant we retained full control of the migration process, reduced operational risk, and could ensure a clear rollback path aligned with our reliability standards.
To validate Valkey’s readiness, we ran comprehensive benchmarks comparing Redis 7.1.0, Valkey 7.2.6, and Valkey 8.0.1. We used redis-benchmark for command-level latency and throughput, and memtier_benchmark for sustained, mixed-workload concurrency testing. We measured latency (average, p50, p95, and p99), throughput (TPS), and memory efficiency (RSS, used memory, and fragmentation ratio). Test scenarios included GET/SET with 8-, 128-, and 2048-byte items, list operations (LPUSH and LPOP), set operations (SADD and SPOP), sorted sets (ZADD and ZPOPMIN), pipelining (-P 16), a 10-minute mixed read/write load, and a memory efficiency test with 1 million keys.
Across workloads, Valkey 8.0.1 achieved the highest throughput, lowest latency (p50–p99), and best memory efficiency, with Valkey 7.2.6 close behind. Under pipelining, Valkey led on SET throughput; in sustained load tests, Redis occasionally hit slightly higher peak TPS but with worse p99 latency. Overall, Valkey 8.0.1 provided the most stable tail latency and lowest memory fragmentation, indicating production readiness.
For data synchronization, we used RedisShake with a full sync (SCAN) followed by continuous updates through KSN.
We set rdb_restore_command_behavior=rewrite to prevent key conflicts; time to live lengths (TTLs) were preserved, and running both modes allowed smooth resynchronizations after interruptions.
Solution overview for blue/green migration from ElastiCache Redis to ElastiCache Valkey
Blue/green deployment is a technique that reduces downtime and risk during migrations by maintaining two identical environments called blue and green. In this migration scenario, blue represents the current ElastiCache Redis cluster, and green represents the target ElastiCache Valkey cluster. At any time, only one environment actively serves production traffic, with the other serving as a staging environment for data synchronization and validation. This approach allows for a transition from ElastiCache for Redis OSS to ElastiCache for Valkey while maintaining data consistency and minimal service disruption.
RedisShake is an open source tool with Valkey support that is developed by tair-opensource at Alibaba Cloud. It helps you move data between Redis and Valkey clusters and provides several important capabilities: it synchronizes data in real-time, works across different AWS regions, supports various Redis and Valkey deployment types, and runs with minimal impact on your source cluster’s performance.
RedisShake supports blue/green deployments through its synchronization engine. The tool keeps your data consistent between your source ElastiCache for Redis cluster (blue environment) and your target ElastiCache for Valkey cluster (green environment). It copies data incrementally and maintains continuous synchronization, so you can switch traffic between environments smoothly when you’re ready.
We used the following key migration modes:
- PSYNC mode – Implements Redis’s built-in partial synchronization protocol, establishing replication connections for live production migrations with continuous data synchronization
- RDB mode – Processes Redis Database (RDB) snapshot files with compressed binary representations, ideal for scheduled bulk migrations during maintenance windows
- SCAN mode – Uses Redis
SCANcommands to traverse keyspace without blocking, perfect for low-impact production migrations with controlled resource consumption
Prerequisites
Before you perform this migration, you need to:
- Provision source cluster
source-elasticacheand target clustertarget-elasticachein your AWS account with the same configuration, such as virtual private cloud (VPC) and instance types, except for the engine type or version. - Set up a migration Amazon Elastic Compute Cloud (Amazon EC2) instance in the same VPC that allows connection access to both the source cluster and target cluster.
- Request the Amazon ElastiCache service team to enable
PSYNCon both source and target clusters.
Solution walkthrough
To perform the migration, complete the following steps:
- Download and install RedisShake on the migration EC2 instance:
- Configuration setup using the following configuration file (shake.toml):
Configuration parameters:
sync_reader: Source ElastiCache Redis cluster configurationredis_writer: Target ElastiCache Valkey cluster configurationtls: Set totruefor AWS ElastiCache connections (required)sync_rdbandsync_aof: Enable snapshot and real-time synchronizationcluster: Set totrueif using cluster mode enabled
- Set up and execute the migration:
- Monitor the progress:
- Verify the migration using a verification script:
Alternatives considered
Redis Input/Output Tools (RIOT) is another open source tool for moving data between Redis instances. Developed by Redis Ltd, this Java-based command line tool helps you migrate and synchronize data across different Redis environments. RIOT provides several key features: it can copy data in real-time, works across different cloud providers, supports different types of Redis setups, and has minimal impact on your source Redis cluster during the migration runs. RIOT can help with blue/green deployments by continuously copying data from the source ElastiCache for Redis OSS cluster (blue environment) to the target ElastiCache for Valkey cluster (green environment). This ongoing synchronization means you can switch traffic between environments smoothly.
RIOT became unmaintained as of October 2025, and as a result it wasn’t selected as the solution for this migration scenario.
Business outcome (why this matters)
MaiCoin achieved the following business improvements:
- Cost efficiency – Valkey delivers meaningful savings compared to Redis OSS—approximately 20% lower for ElastiCache self-designed cache clusters and up to 33% lower for ElastiCache Serverless. Combined with AWS Graviton based instance types, it reduces annual run-rate without sacrificing performance.
- Operational safety – The blue/green strategy, combined with structured consistency checks, provides a repeatable and auditable migration framework for future upgrades.
- Performance headroom – Lower tail latency and improved memory utilization strengthen system resilience during market volatility.
- Future-proofing – With a proven rollback path and code-driven infrastructure provisioning, cluster expansion and version upgrades become predictable and low-risk operations.
Conclusion
This post explored MaiCoin’s practical approaches using RedisShake for migrating from Amazon ElastiCache for Redis OSS to Amazon ElastiCache for Valkey using blue/green deployment strategies. RedisShake provides a reliable migration path with continuous data synchronization, so you can validate your target environment before switching production traffic. It excels at large-scale migrations with detailed logging capabilities while maintaining near-zero downtime during the migration process.
The key to successful migration is proper planning: set up network access between clusters, enable TLS on both endpoints, monitor replication lag, and validate thoroughly before switching traffic.
To get started, assess your current environment and your dataset size, then test the process in a nonproduction environment first. The continuous synchronization capability of RedisShake gives you confidence to validate thoroughly before switching production traffic.