AWS Big Data Blog

Manage and control your cost with Amazon Redshift Concurrency Scaling and Spectrum

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence tools.

This post shares the simple steps you can take to use the new Amazon Redshift usage controls feature to monitor and control your usage and associated cost for Amazon Redshift Spectrum and Concurrency Scaling features. Redshift Spectrum enables you to power a lake house architecture to directly query and join data across your data warehouse and data lake, and Concurrency Scaling enables you to support thousands of concurrent users and queries with consistently fast query performance.

Why this feature is important

With tens of thousands of customers, the Amazon Redshift team has the benefit of observing the workloads and behavior across a large variety of customers, including the internal teams at Amazon. We observed that across both internal and external customers, Amazon Redshift is running demanding workloads, such as extract, transform, and load (ETL) pipelines that condense several massive datasets, many of which are hundreds of terabytes in size, into single consumable tables for reporting and analytic purposes. These jobs take advantage of Concurrency Scaling to automatically scale Amazon Redshift query processing to handle burst workloads, and Redshift Spectrum to perform analytics on the transformed datasets by joining them with external data stored in a variety of open data formats in the data lake backed by Amazon Simple Storage Service (Amazon S3). Both these features are charged based on the usage, making cost management important, especially during peak periods when there is more activity for an organization. Although a combination of Amazon CloudWatch alarms and workload management (WLM) query monitoring rules can help to keep track and monitor usage, customers have asked to control their cost at the Amazon Redshift cluster level based on usage.

Usage limits to control Concurrency Scaling and Redshift Spectrum costs

With the new usage controls feature, you can now monitor and control the usage and associated costs for Redshift Spectrum and Concurrency Scaling. You can create daily, weekly, and monthly usage limits, and define actions to take if those limits are reached to maintain predictable spending. Actions include logging usage stats as an event to a system table, generating Amazon Simple Notification Service (Amazon SNS) alerts, and disabling Redshift Spectrum or Concurrency Scaling based on your defined thresholds. The new usage controls feature allows you to continue reaping the benefits provided by both Concurrency Scaling and Redshift Spectrum with the peace of mind that you can stay within budget simply by configuring the appropriate thresholds.

Setting up and managing usage controls

You can configure Amazon Redshift usage control options on the Amazon Redshift console or by using the AWS Command Line Interface (AWS CLI) or API. You can choose to set up to four limits per feature, allowing for multiple levels of logging or notifications before you disable Redshift Spectrum or Concurrency Scaling. The usage limit settings available are largely the same for both Concurrency Scaling and Redshift Spectrum usage—the main difference is that Concurrency Scaling usage limits are based on time spent (hours and minutes), while Redshift Spectrum usage limits are based on terabytes of data scanned. The fields you can adjust and select include the following:

  • Time – The time range for which your usage limits should be applied. You can choose daily, weekly, or monthly.
  • Usage limit – For Concurrency Scaling, this allows you to enter an integer value for hours and minutes to limit the amount of time this feature can be used before the usage limit kicks in. For Redshift Spectrum, you enter an integer value for the total number of terabytes you want to allow to be scanned before the limits apply.
  • Action – The action you want to take when your usage control limit has been reached. You can choose from Log to system table, Alert, or Disable feature. The Alert and Disable feature actions trigger a CloudWatch metric alarm and you can optionally set to send Amazon SNS-based notifications.

The ability to configure up to four limits per feature, combined with the three available actions that you can take, provides accurate visibility into your current usage and a way to generate metrics of your usage patterns. You can use the Disable feature option to easily prevent going over budget, and the Alert and Log actions can provide valuable insights, such as how you are currently using Redshift Spectrum and Concurrency Scaling. For example, configuring a usage limit with a daily time period and an action to log to a system table allows you to easily generate metrics on which days you had higher utilization of Redshift Spectrum or Concurrency Scaling. These metrics can provide insights into high-traffic days and potential areas where your pipelines can be adjusted to better distribute your traffic. Another option is to configure a weekly alert limit at 25% of your desired monthly usage as a way to ensure that you are within your monthly expected budget.

Setting usage control limits on the Amazon Redshift console

To set usage limits for Concurrency Scaling and Redshift Spectrum using the new Amazon Redshift console, perform the following steps:

  1. On the Amazon Redshift console, choose
  2. Select your desired cluster.
  3. From the Actions drop-down menu, choose Configure usage limit.

  1. To configure usage limits for Concurrency Scaling, choose Configure usage limit in the Concurrency scaling usage limit
  2. To configure usage limits for Redshift Spectrum, choose Configure usage limit in the Redshift Spectrum usage limit

  1. In the Configure usage limit section, select or deselect Concurrency scaling and Redshift Spectrum.

Selecting one of those options brings up the corresponding configuration windows.

  1. Choose a Time period (Daily, Weekly, or Monthly) from the drop-down menu.
  2. Enter your desired Usage limit.
  3. From the Action drop-down menu, choose an action (Alert, Log to system table, or Disable feature).

  1. To configure additional usage limits, choose Add another limit and action.
  2. When you have configured all your desired usage limits, choose Configure to confirm your usage limit settings.

Your configurations are now visible in the usage limit dashboard.

Managing usage control limits via the Amazon Redshift console

You can edit and delete usage limits on the Amazon Redshift console. The Edit option allows you to add limits or modify existing limit settings, and the Delete option deletes all configured limits for the corresponding service. To manage your configurations, perform the following steps:

  1. On the Amazon Redshift console, choose
  2. Select your desired cluster.
  3. From the Actions drop-down menu, choose Configure usage limit.
  4. To edit your existing usage limit configurations, choose Edit in the corresponding service box.

The editing option lets you add a new usage limit, remove a usage limit, or modify an existing usage limit and corresponding action.

To modify the time period of an existing usage limit, you can remove and add it as a new usage limit.

To delete your Concurrency Scaling limits, choose Delete usage limit in the Concurrency scaling usage limit section.

To delete your Redshift Spectrum limits, choose Delete usage limit in the Redshift Spectrum usage limit section. Choosing Delete usage limit removes all limits configured for that service.

Setting usage control limits via the AWS CLI

You can also use the AWS CLI to add, edit, describe, or remove usage control configurations. The following examples outline the required CLI commands for each use case:

  • create-usage-limit – This command adds a new usage limit configuration for your Amazon Redshift cluster. The command should include the following parameters:
    • –cluster-identifier – The name of the cluster on which to apply the usage control.
    • –period – The time range for your usage limit. You can enter daily, weekly, or monthly for this parameter.
    • –feature-type – The service to which you want to apply this usage control. You can enter spectrum or concurrency-scaling for this parameter.
    • –limit-type – For Redshift Spectrum, this parameter should be set to data-scanned. For Concurrency Scaling, this should be set to time.
    • –amount – For Redshift Spectrum, this parameter should equal the total terabytes allowed to be scanned in increments of 1 TB. For Concurrency Scaling, this parameter should be set to the total minutes (on the console, you can do this in hh:mm) allowed before limits actions are applied.
    • –breach-action – The action to take when you reach your configured limit. Possible values are log, emit-metric, or disable. emit-metric sends metrics for CloudWatch.

See the following example code:

aws redshift create-usage-limit --cluster-identifier <yourClusterIdentifier> --period <daily|weekly|monthly> --feature-type <spectrum|concurrency-scaling> --limit-type <data-scanned|time> --amount <yourDesiredAmount> --breach-action <log|emit-metric|disable>
  • describe-usage-limits – This command returns a JSON response that lists the configured usage limits for the cluster you choose. The response includes all the configurable fields, such as the limit type and breach actions, and includes a usage limit ID, which is required for the modify and delete commands. The describe command should include the following parameter:
    • –cluster-identifier – The cluster identifier for which you want to obtain the configured usage limits.

See the following example code:

aws redshift describe-usage-limits --cluster-identifier <yourClusterIdentifier>
{
    "UsageLimits": [
        {
            "LimitType": "data-scanned",
            "Period": "daily",
            "BreachAction": "log",
            "FeatureType": "spectrum",
            "UsageLimitId": "4257b96e-5b12-4348-adc2-4922d2ceddd2",
            "Amount": 1,
            "ClusterIdentifier": "cost-controls-demo"
        },
  • modify-usage-limit – This command allows you to modify an existing usage limit configuration on your Amazon Redshift cluster. This command requires the UsageLimitID for the limit you want to modify, which you can obtain by running the describe-usage-limits The modify-usage-limit command should include the following parameters:
    • –usage-limit-id – The ID of the usage limit that you want to modify. You can obtain this by running the describe-usage-limits.
    • –amount – The new value for your limit threshold.
    • –breach-action – The new action to take if you reach your limit threshold.

See the following example code:

aws redshift modify-usage-limit --usage-limit-id "<yourUsageLimitID>" --amount <newAmount> --breach-action <newBreachAction>
  • delete-usage-limit – This command deletes a configured usage limit from your Amazon Redshift cluster. This command requires the UsageLimitID, which you can obtain by running the describe-usage-limits This command should have the following parameter:
    • –usage-limit-id – The ID of the limit that you want to delete.

See the following example code:

aws redshift delete-usage-limit --usage-limit-id "<yourUsageLimitID>"

For more information, see Managing usage limits in Amazon Redshift.

Summary

The Amazon Redshift usage controls provide you with an easy way to monitor, alert, and limit the cost you incur when using Concurrency Scaling and Redshift Spectrum features. With up to four limits configurable per feature and options to log events, trigger Amazon SNS notifications, or disable the features altogether from the Redshift console and AWS CLI, you have all the tools needed to make sure you stay within your budget.

 


About the Authors

Vince Marchillo is a Solutions Architect within Amazon’s Business Data Technologies organization. Vince guides customers to leverage scalable, secure, and easy-to-use data lake architecture powered by AWS services and other technologies.

 

 

 

 

Maor Kleider is a product and database engineering leader for Amazon Redshift. Maor is passionate about collaborating with customers and partners, learning about their unique big data use cases and making their experience even better. In his spare time, Maor enjoys traveling and exploring new restaurants with his family.

 

 

 

Satish Sathiya is a Product Engineer at Amazon Redshift. He is an avid big data enthusiast who collaborates with customers around the globe to achieve success and meet their data warehousing and data lake architecture needs.