AWS Big Data Blog

Accelerate resizing of Amazon Redshift clusters with enhancements to classic resize

October 2023: This post was reviewed and updated to include the latest enhancements in Amazon Redshift’s resize feature.

Amazon Redshift has improved the performance of the classic resize feature for multi-node RA3 clusters and increased the flexibility of the cluster snapshot restore operation. You can use the classic resize operation to resize a cluster when you need to change the instance type or transition to a configuration that can’t be supported by elastic resize. Previously, this could take the cluster offline for many hours during the resize, but now the cluster is typically be available to process queries in minutes. Clusters can also be resized when restoring from a snapshot.

In this post, we show you how the classic resize with enhancements work and also how this enhancement significantly improves cluster availability. We will also walk through the steps to resize your Amazon Redshift cluster using classic resize with enhancements.

Existing resize options

We’ve worked closely with our customers to learn how their needs evolve as their data scales. To address and meet your ever-growing demands, you often have to resize your Amazon Redshift cluster and choose an optimal instance type that delivers the best price/performance. As of this writing, there are two ways you can resize your clusters: elastic resize and classic resize.

Among the two options, elastic resize is the fastest available resize mechanism because it works based on slice remapping instead of full data copy. Classic resize is primarily used when the target cluster configuration is outside the allowed slice ranges by elastic resize. Let’s briefly discuss these scenarios before describing how the new migration process helps.

Enhancements to classic resize

Enhancements to classic resize makes cluster available for reads and writes faster like elastic resize, but performs similar functions like classic resize, thereby offering the best of both approaches.

The enhancements are done in two stages:

  • Stage 1 (Critical path) – This first stage consists of migrating the meta-data from the source cluster to the target cluster, during which the source cluster is in read-only mode. Typically, this is a very short duration. Then the cluster is made available for read and write queries. All the tables with KEY distribution style will be stored with EVEN style distribution. They will be re-distributed to key style in Stage2.
  • Stage 2 (Off critical path) – The stage involves redistributing the data as per the previous data distribution style. This process runs in the background off the critical path of migration from the source to target cluster. The duration of this stage is dependent on the volume to distribute, cluster workload, Cluster node type, etc.

Please note that if the cluster is being heavily used while the stage 2 process is running in the background, it might experience slowness. This is because these background processes run on low priority queues and Redshift gives high priority queues to heavy workloads running on the cluster. To expedite data redistribution for any table, you can manually run ALTER TABLE which will prioritize this table i.e., instead of waiting for background processes, you can explicitly change distribution style from EVEN to KEY by using alter statement. You can also consider elastically resizing the cluster to higher configuration before you start classic resize, run the classic resize with enhancements and bring down the configuration after classic resize’s stage 2 completes by running elastic resize again. This will help to run stage2 process faster with additional capacity added.

Let’s see how enhancements in classic resize works with configuration changes.

Prerequisites

Complete the following prerequisite steps:

  1. Target cluster should be RA3 node type with a minimum of two nodes. Source Cluster can be DC2/DS2/RA3.
  2. Take a snapshot of the cluster and it should be no more than 10 hours old before starting classic resize with enhancements.
  3. The size of the data must be below 2 PB. Please reach AWS support if you want to resize a cluster with more than 2PB.
  4. Cluster should be in VPC.
  5. Provide the AWS Identity and Access Management (IAM) role credentials that are required to run the AWS CLI. For more information, refer to Using identity-based policies (IAM policies) for Amazon Redshift.

Configuration change

As of this writing, you can use it to change the cluster configuration from DC2, DS2, and RA3 node types to any RA3 node type. However, changing from RA3 to DC2 or DS2 is not supported yet.

We did a benchmark on the enhancements in classic resize with different cluster combinations and volumes. The following table summarizes the results comparing critical paths in classic resize to classic resize with enhancements.

Volume Source Cluster Target Cluster Classic Resize
Duration (min)
Classic Resize
Stage1 Duration (min)
% Faster
10 TB ra3 4xlarge – 6 nodes ra3 16xlarge – 8 nodes 78 11 86%
10 TB ra3 16xlarge – 8 nodes ra3 4xlarge – 2 nodes 738 11 99%
10 TB dc2 8xlarge – 6 nodes ra3 4xLarge – 2 nodes 706 8 99%
3 TB ra3 4xLarge – 2 nodes ra3 16xLarge – 4 nodes 53 11 79%
3 TB ra3 16xLarge – 4 nodes ra3 4xLarge – 2 nodes 244 7 97%
3 TB dc2 8xlarge – 6 nodes ra3 4xLarge – 2 nodes 251 7 97%

The classic resize with enhancements consistently completed in significantly less time and made the cluster available for read and write operations in a short time. Classic resize took longer time in all cases and kept the cluster in read-only mode, making it unavailable for writes. Also, the classic resize duration is comparatively longer when the target cluster configuration is smaller than the original cluster configuration.

Perform classic resize with enhancements

You can use either of the following two methods to resize your cluster to leverage the enhancements to classic resize.

Note: If you initiate Classic Resize from user interface, the enhancements will only apply for RA3 target node types i.e., no enhancements will run for DC2/DS2 target node types

  • Modify cluster method – Resize an existing cluster without changing the endpoint. The following are the steps involved:
    • Take a snapshot on the current cluster prior to performing the resize operation.
    • Determine the target cluster configuration and run the following command from the AWS CLI:
      aws redshift modify-cluster --region <CLUSTER REGION>
      --endpoint-url https://redshift.<CLUSTER REGION>.amazonaws.com/
      --cluster-identifier <CLUSTER NAME>
      --cluster-type multi-node
      --node-type <TARGET INSTANCE TYPE>
      --number-of-nodes <TARGET NUMBER OF NODES>

      For example:

      aws redshift modify-cluster --region us-east-1
      --endpoint-url https://redshift.us-east-1.amazonaws.com/
      --cluster-identifier my-cluster-identifier
      --cluster-type multi-node
      --node-type  ra3.16xlarge
      --number-of-nodes 12
  • Snapshot restore method – Restore an existing snapshot to the new cluster with the new cluster endpoint. The following are the steps involved:
    • Identify the snapshot for restore and a unique name for the new cluster.
    • Determine the target cluster configuration and run the following command from the AWS CLI:
      aws redshift restore-from-cluster-snapshot --region <CLUSTER REGION>
      --endpoint-url https://redshift.<CLUSTER REGION>.amazonaws.com/
      --snapshot-identifier <SNAPSHOT ID> 
      --cluster-identifier <CLUSTER NAME>
      --node-type <TARGET INSTANCE TYPE>
      --number-of-node <TARGET NUMBER OF NODES>

      For example:

      aws redshift restore-from-cluster-snapshot --region us-east-1
      --endpoint-url https://redshift.us-east-1.amazonaws.com/
      --snapshot-identifier rs:sales-cluster-2022-05-26-16-19-36
      --cluster-identifier my-new-cluster-identifier
      --node-type ra3.16xlarge
      --number-of-node 12

Note: Snapshot restore method will perform elastic resize if the new configuration is within allowed ranges, else it will use the classic resize with enhancements.

Monitor the resize process

You can monitor the progress through the cluster management console. You can also check the events generated by the resize process. The resize completion status is logged in events along with the duration it took for the resize. The following screenshot shows an example.

It’s important to note that you may observe longer query times in the second stage. During the first stage, the data for tables with dist-key distribution style is transferred as dist-even. Later, a background process converts them back to dist-key (in stage 2). However, background processes are running behind the scenes to get the data redistributed to the original distribution style (the distribution style before the cluster resize). You can monitor the progress of the background processes by querying the stv_xrestore_alter_queue_state table. It’s important to note that tables with ALL, or EVEN distribution styles don’t require redistribution post-resize. Therefore, they’re not logged in the stv_xrestore_alter_queue_state table. The counts you observe in these tables are for the tables with distribution style as Key before the resize operation.

See the following example query:

select db_id, status, count(*) from stv_xrestore_alter_queue_state group by 1,2 order by 3 desc

The following table shows that for 60 tables data redistribution is finished, for 323 tables data redistribution is pending, and for 1 table data redistribution is in progress.

In addition to the above system view, a new system view , svl_restore_alter_table_progress is added to provide more detailed view of the status. The new system table shows the exact percentage that the table under conversion has completed.

The following a sample query:

select * from svl_restore_alter_table_progress;

The result is the following:

tbl   | progress |                          message                          
--------+----------+-----------------------------------------------------------
 105614 | ABORTED  | Abort:Table no longer contains the prior dist key column.
 105610 | ABORTED  | Abort:Table no longer contains the prior dist key column.
 105594 | 0.00%    | Table waiting for alter diststyle conversion.
 105602 | ABORTED  | Abort:Table no longer contains the prior dist key column.
 105606 | ABORTED  | Abort:Table no longer contains the prior dist key column.
 105598 | 100.00%  | Restored to distkey successfully.

We ran tests to assess time to complete the redistribution. For 10 TB of data, it took approximately 5 hours and 30 minutes on an idle cluster. For 3 TB, it took approximately 2 hours and 30 minutes on an idle cluster. The following is a summary of tests performed on larger volumes:

  • A snapshot with 100 TB where 70% of blocks needs redistribution would take 10–40 hours
  • A snapshot with 450 TB where 70% of blocks needs redistribution would take 2–8 days
  • A snapshot with 1600 TB where 70% of blocks needs redistribution would take 7–27 days

The actual time to complete redistribution is largely dependent on data volume, cluster idle cycles, target cluster size, data skewness, and more. Therefore, we recommend performing classic resize with enhancements when there is enough of an idle window (such as weekends) for the cluster to perform redistribution.

Limitations

Below are some limitations to consider:

  1. Snapshots created before classic resize with enhancements cannot be used for table level restore or other purpose.
  2. The sort order of a table might change after stage 1. Similar to a table’s distribution key, sort order of a table will be restored in the background over time. If required, you can sort the table without waiting for background processes. Sort order will change for below scenarios:
    • If the target cluster configuration has more data slices after resize, tables with distribution keys will lose their sort order. The sort order is recovered after the tables’ distribution key is restored.
    • If the target cluster configuration has fewer data slices after resize, tables with distribution key or dist style Even will lose the sort order.

Summary

In this post, we talked about the improved performance of Amazon Redshift’s classic resize feature and how classic resize with enhancements significantly improves your ability to scale your Amazon Redshift clusters using the classic resize method. We also talked about when to use different resize operations based on your requirements. We demonstrated how it works from the console and using the AWS CLI. We also showed the results of our benchmark test and how it significantly improves the migration time for configuration changes and encryption changes for your Amazon Redshift cluster.

To learn more about resizing your clusters, refer to Resizing clusters in Amazon Redshift. If you have any feedback or questions, please leave them in the comments.


About the authors

Satesh Sonti is a Sr. Analytics Specialist Solutions Architect based out of Atlanta, specialized in building enterprise data platforms, data warehousing, and analytics solutions. He has over 16 years of experience in building data assets and leading complex data platform programs for banking and insurance clients across the globe.

Krishna Chaitanya Gudipati is a Senior Software Development Engineer at Amazon Redshift. He has been working on distributed systems for over 14 years and is passionate about building scalable and performant systems. In his spare time, he enjoys reading and exploring new places.

 Jyoti Aggarwal is a Product Manager with Amazon Redshift. She leads the product and go-to-market strategy for zero-ETL and Redshift elasticity and scaling, including driving initiatives around performance, customer experience and observability. She brings along an expertise in data warehouse and B2B/B2C customer experience.

Varuna Chenna Keshava is a Senior Software Development Engineer at AWS. She worked on building end-to-end applications using different technology and database solutions over the last 10 years. She is passionate to build scalable, highly available solutions for customers and solve distributed systems problems. In her spare time, she loves to hike, play piano and travel.