Amazon Redshift DC2 migration approach with a customer case study

This is a guest post by Satoru Ishikawa, Solutions Architect at Classmethod in partnership with AWS.

In April 2025, AWS announced the deprecation of Amazon Redshift DC2 instances, guiding users to migrate to either Redshift RA3 instances or Redshift Serverless. Redshift RA3 instances and Serverless adopt a design that separates storage and compute, offers new features such as data sharing, concurrency scaling for writes, zero-ETL , and cluster relocation.

In this post, we share insights from one of our customers’ migration from DC2 to RA3 instances. The customer, a large enterprise in the retail industry, operated a 16-node dc2.8xlarge cluster for business intelligence (BI) and ETL workloads. Facing growing data volumes and disk capacity limitations, they successfully migrated to RA3 instances using a Blue-Green deployment approach, achieving improved ETL query performance and expanded storage capacity while maintaining cost efficiency.

Amazon Redshift architecture types

Amazon Redshift offers two deployment options: Provisioned mode, where you choose the instance type and number of nodes and manage resizing as needed, and Redshift Serverless, which automatically provisions data warehouse capacity and intelligently scales the underlying resources. The following diagram compares these two architecture types.

Provisioned clusters require you to determine cluster size in advance, but you can optimize costs by purchasing Reserved Instances (RI) or scheduling pause and resume actions. Serverless automatically provisions resources as needed, with a pay-per-use model where you only pay for compute resources consumed. Both services support migration between each other and offer the same features including SQL, zero-ETL, and Federated Query capabilities. For specific pricing details, see Amazon Redshift pricing.

Provisioned clusters are suitable for large-scale, predictable workloads and offer automatic scaling based on queuing. Serverless provides management-free automatic scaling for variable workloads with AI-driven optimization that scales based on workload complexity and data volumes. For more details, refer to Comparing Amazon Redshift Serverless to an Amazon Redshift provisioned data warehouse.

Customer case study: Migration from DC2 instances

This section describes the customer’s migration from Amazon Redshift DC2 to RA3 instance types. The migration used a Blue-Green deployment approach that minimized downtime while achieving both cost optimization and performance improvement.

The customer’s workload had the following characteristics:

Use cases

The customer had the following key use cases for their Amazon Redshift deployment:

Query via BI tool during business hours
1. High volume of read queries
2. Peak access during Mondays and beginning of months
Data processing in early morning
1. Concentrated write queries for data loading and transformation
Steady-state workload characteristics
1. Run queries more than 16 hours daily

Requirements

The customer had the following key requirements for their Amazon Redshift migration:

Performance
1. Use auto-scaling (such as concurrency scaling) during peak access periods
Data size
1. Disk capacity expansion needed
Cost Management
1. Easy budget prediction and management
2. Utilize discount services for long-term usage
Compatibility
1. Maintain compatibility with existing applications and BI tools
2. Avoid endpoint changes
Availability
1. Maximum downtime of 8 hours acceptable during migration
Network
1. Do not modify the existing 2-Availability Zone (AZ) subnet configuration
When to migrate
1. To be conducted during low-load days and hours
2. Planned downtime possible within 8 hours

Key considerations in system design, implementation, and operation included extended operation hours, ease of budget prediction and management, cost optimization through Reserved Instances (RI), and maintaining compatibility with existing systems (avoiding endpoint changes). The customer evaluated Amazon Redshift Serverless, which offered attractive features such as a pay-per-use model, automatic scaling capabilities, and the potential for better price performance for variable workloads. While both Redshift Serverless and provisioned clusters could effectively support their workload patterns, the customer chose the provisioned model with RA3 nodes, leveraging their years of operational experience with provisioned environments, existing RI strategy, and established capacity planning approach.

Features of RA3 instance type

Built on the AWS Nitro System, RA3 instances with managed storage adopt an architecture that separates computing and storage, allowing independent scaling and separate billing for each component. These instances use high-performance SSDs for hot data and Amazon S3 for cold data, providing ease of use, cost-effective storage, and fast query performance. For more details, refer to Amazon Redshift RA3 instances with managed storage.

Migration prerequisites

The customer had the following migration prerequisites in place:

The customer used a Redshift cluster with 16 nodes of dc2.8xlarge configuration.
The customer chose a Blue-Green deployment approach for migration, where they would restore from a snapshot to RA3 instance type, enabling quick rollback if necessary.
The customer implemented cluster switching and rollback through endpoint switching using cluster identifier rotation.
Additionally, to improve performance with high concurrency, they transitioned the transaction isolation level from SERIALIZABLE ISOLATION to SNAPSHOT ISOLATION.

Cluster migration methods

There were two migration options available: Elastic Resize and Classic Resize.

Amazon Redshift’s Classic Resize functionality had been enhanced, for resizing to RA3 instance types, significantly reducing the write-unavailable period. Based on PoC testing, after initiating the resize, the cluster’s status was modifying for 16 minutes before it became available. Based on these results, the customer proceeded with the Classic Resize approach.

Cluster sizing

Sizing involved determining the instance type and number of nodes for the migration target. Sizing points considered workload characteristics such as CPU-intensive (queries using high CPU), I/O-intensive (queries with high data read/write), or both.When migrating from DC2 instance types, additional nodes might be required depending on workload requirements. Nodes were added or removed based on the computing requirements for necessary query performance.

Comparing configurations with similar cluster costs in terms of instance size and count, for a dc2.8xlarge 16-node cluster, the recommended configuration was 8 nodes of ra3.16xlarge. The following was the cost comparison in the Tokyo Region:

Recommended: dc2.8xlarge 16-node cluster => ra3.16xlarge * 8-node cluster
1. $97.52/h (6.095/h * 16 nodes) => $122.776/h (15.347/h * 8 nodes)
Cost-focused: dc2.8xlarge 16-node cluster => ra3.16xlarge * 6-node cluster
1. $97.52/h (6.095/h * 16 nodes) => $92.082/h (15.347/h * 6 nodes)

For this migration, the customer proceeded with a cost-efficient 6-node ra3.16xlarge cluster to stay within existing budget constraints. However, since this node count could face throughput limitations during certain times, they enabled concurrent scaling for the RA3 instance type to handle spike access.

Concurrency scaling provides up to 1 hour of free credits per day for each active cluster, accumulating up to 30 hours. On-demand usage fees apply when exceeding this free tier.While the customer chose to implement concurrency scaling, Elastic Resize to temporarily increase nodes during peak loads was also considered but rejected due to on-demand costs for additional nodes and the brief disconnection period during switching.

Managed storage cost

RA3 instances use Redshift Managed Storage (RMS), which is charged at a fixed GB-month rate. The customer’s approximately 2 TB of data required including storage costs in the estimates. For pricing details, see Amazon Redshift pricing.

Migration step from DC2 to RA3

After creating an RA3 cluster from the DC2 cluster’s snapshot, the customer swapped the cluster identifiers. The following diagram shows this process.

Take a snapshot of the current DC2 cluster.
Restore RA3 cluster from the snapshot with a different cluster identifier (Classic Resize)
Swap the cluster identifiers between the current DC2 cluster and the new RA3 cluster.

If any issues arise after the cluster switch, you can quickly roll back by returning the original DC2 cluster to its original cluster identifier.

Note: Restore from a snapshot

Running the restore operation using CLI commands is recommended to minimize operational errors and ensure reproducibility. The following is a sample command.

aws redshift restore-from-cluster-snapshot \
--cluster-identifier for-ra3-20250207 \
--snapshot-identifier cm-cluster-for-ra3-20250207 \
--cluster-subnet-group-name cm-cluster \
--vpc-security-group-ids sg-1234567a sg-2345678b sg-3456789c \
--cluster-parameter-group-name cm-cluster \
--node-type ra3.16xlarge \
--number-of-nodes 6 \
--port 5439 \
--no-publicly-accessible \
--enhanced-vpc-routing \
--availability-zone ap-northeast-1a \
--preferred-maintenance-window sat:17:00-sat:17:30 \
--automated-snapshot-retention-period 14 \
--iam-roles 'arn:aws:iam::123456789012:role/AmazonRedshift-CommandsAccessRole' 'arn:aws:iam::123456789012:role/AmazonRedshift-Spectrum' \
--maintenance-track-name current

Production migration duration

The time required for the restore and classic resize steps can vary significantly depending on data volume and target cluster specifications. The customer conducted a rehearsal beforehand to measure the actual required time.

Test results

Before the production migration, the customer created a test cluster by restoring a snapshot to the RA3 instance type. While Redshift Test Drive is typically useful for workload testing, this customer faced unique constraints: enabling audit logging in their production cluster would require configuration changes, cluster restarts, and complex approval processes under their strict change management policies. To address this, they developed a custom load testing tool that captured workload patterns using Amazon Redshift system views (SYS_QUERY_HISTORY and SYS_QUERY_TEXT), which maintain 7 days of query history. The tool replayed 55,755 historical queries with 50-way parallelism against both DC2 and RA3 clusters, comparing metrics including query execution time, CPU utilization, and disk I/O. Query result caching was disabled during testing to ensure accurate comparisons.

BI query performance

BI queries were tested using the custom load testing tool. The results represent the average execution time from 15 test runs of 55,755 queries executed with 50-way parallelism. Without concurrency scaling, the dc2.8xlarge 16-node cluster averaged 45.82 seconds per query, while the ra3.16xlarge 6-node cluster averaged 91.30 seconds. This indicated that RA3 instances showed longer execution times for short and medium queries in a direct migration without optimizations. However, enabling concurrency scaling improved RA3 performance progressively. With concurrency scaling enabled at maximum 2 clusters, the ra3.16xlarge 6-node cluster achieved an average of 72.48 seconds per query, a 21% improvement over the non-scaled configuration.

Node Type / Number of nodes	Average Query Time
ra3.16xlarge 6-node cluster	72.48 seconds

ETL query performance comparison

For long-running ETL queries (execution time greater than 10 minutes), the RA3 cluster demonstrated better performance than DC2. These results represented a direct migration of the customer’s workload with no optimizations applied.

For the Large-scale data load workload 1, the ra3.16xlarge cluster completed the query 28% faster than the dc2.8xlarge cluster (41 minutes vs. 57 minutes).
For the Complex transformation workload 1, the ra3.16xlarge cluster was 23% faster (1 hour 1 minute vs. 1 hour 20 minutes).

These results indicated that the RA3 node type was more performant for time-intensive data loading and transformation tasks. The higher CPU utilization values for RA3 suggested more effective compute resource usage.

Node Type / Number of nodes	Average Query Time	MAXCPU%
ra3.16xlarge 6-node cluster	41 mins 09 seconds	11.45
dc2.8xlarge 16-node cluster	57 mins 07 seconds	10.85
Node Type / Number of nodes	Average Query Time	MAXCPU%
ra3.16xlarge 6-node cluster	1 hour 01 mins 33 seconds	74.23
dc2.8xlarge 16-node cluster	1 hour 20 mins 36 seconds	53.58

Performance tuning

Based on the test results, the customer identified that RA3 showed longer execution times for short and medium BI queries but faster performance for long-running ETL queries compared to DC2. To optimize overall performance, they focused on identifying slow queries and frequently referenced tables, prioritizing optimizations with the highest impact.

Performance tuning strategy

The customer considered several optimization strategies to leverage RA3’s architectural advantages. One key strategy involved pre-processing ad-hoc short and medium query workloads during low-load periods, creating pre-processed tables or materialized views for queries that repeatedly performed joins, aggregations, filters, and projections. RA3’s separated compute and storage architecture, with cost-effective large-scale storage, supported this approach.

Converting regular views to materialized views

Analysis of slow queries revealed the use of joins in views, and frequently referenced tables were being accessed multiple times through these views. As a countermeasure, the customer replaced frequently used regular views with materialized views, removing unnecessary data ranges and redundant columns.

Amazon Redshift supports incremental updates of materialized view contents via the REFRESH MATERIALIZED VIEW command, enabling efficient data updates.

Materialized views and query rewrite

By converting regular views to materialized views, existing queries may be automatically optimized through the “query rewrite” feature provided by the query planner. For more details, refer to “Automatic query rewriting to use materialized views“.

Automatic tuning with AutoMV

On the DC2 cluster, disk utilization consistently exceeded 80%, which disabled the AutoMV feature due to insufficient disk space. With RA3’s expanded storage, automatic tuning through AutoMV became possible, leading to further performance improvements. For more details about AutoMV, refer to Automated materialized views.

Performance tuning results

After applying these optimizations, the customer achieved the following results:

Maintained existing performance while controlling cost increases
Achieved higher CPU utilization while maintaining throughput
Enhanced dynamic throughput during peak load periods using concurrency scaling’s automatic scaling

Conclusion

In this post, you learned how a large retail enterprise successfully migrated from Amazon Redshift DC2 to RA3 instances. The Blue-Green deployment approach enabled a safe migration with quick rollback capability, while the separated compute and storage architecture of RA3 provided flexibility to handle growing data volumes. Although RA3 showed different performance characteristics for short BI queries compared to DC2, the customer achieved significant improvements in long-running ETL query performance (up to 28% faster for data loads and 23% faster for complex transformations). By leveraging RA3-specific features such as materialized views and AutoMV, they optimized overall query performance while maintaining cost efficiency through Reserved Instances and concurrency scaling.

To continue your RA3 migration journey, see Best practices for upgrading from Amazon Redshift DC2 to RA3 and Amazon Redshift Serverless and Resize Amazon Redshift from DC2 to RA3 with minimal or no downtime for additional guidance and best practices.

AWS Big Data Blog