AWS News Blog

Amazon Redshift – New Features Galore

We have added a very large collection of new features to Amazon Redshift. You now have more options and more ways to organize and query your petabyte-scale data warehouse.

Here’s a summary:

  • Distributed Tables – You now have more control over the distribution of a table’s rows across compute nodes.
  • Remote Loading – You can now load data into Redshift from remote hosts across an SSH connection.
  • Approximate Count Distinct – You can now use a variant of the COUNT function to approximate the number of matching rows.
  • Workload Queue Memory Management – You can now apportion available memory across work queues.
  • Key Rotation – You can now direct Redshift to rotate keys for an encrypted cluster.
  • HSM Support – You can now direct Redshift to use an on-premises Hardware Security Module (HSM) or AWS CloudHSM to manage the encryption master and cluster encryption keys.
  • Database Auditing and Logging – You can log connections and user activity to Amazon S3.
  • SNS Notification – Redshift can now issue notifications to an Amazon SNS topic when certain events occur.

Let’s take a deeper look at each of these new features.

Distributed Tables
As part of the query planning process, the Redshift optimizer determines where the data blocks need to be located in order to best execute the query. The data is then physically moved, or redistributed, during execution. This process can account for a substantial part of the cost of a query plan.

The storage for each compute node is divided into slices. Each XL compute node has two slices and each 8XL compute node has 16.

You can now choose to exercise additional control over the way that Redshift distributes table rows to compute nodes by choosing one of the following distribution styles when you create the table:

  • Even – Rows are distributed across slices in round-robin fashion.
  • Key – Rows with the same keys will tend to be stored on the same slice.
  • All – Rows are distributed to all nodes.

Read more about choosing a distribution style.

Remote Loading
The Redshift COPY command can now reach out to remote locations (Elastic MapReduce clusters, Amazon EC2 instances, and on-premises hosts) to load data from external sources.

In order to do this you must add the cluster’s public key to the remote host’s authorized keys file (or equivalent). You must also configure the remote host to accept incoming connections from all of the IP addresses in the cluster.

Next, you create a simple manifest file in JSON format and upload it to an Amazon S3 bucket. The manifest provides Redshift with the information that it needs to have in order for it to connect to the remote host and to retrieve the data.

Finally, you issue a COPY command, including a reference to the manifest file.

Read more copying from remote hosts.

Approximate Count Distinct
You can now specify the APPROXIMATE option when you use Redshift’s COUNT DISTINCT function. Queries specified in this way use a HyperLogLog algorithm to approximate the number of distinct non-NULL values in a column or expression. This is much faster for large tables, and has a relative error of about 2%.

Read more about the COUNT function.

Workload Memory Management
A newly created Redshift cluster is configured with a single queue, capable of running five queries concurrently. You can add up to seven additional queues, each with a configurable level of concurrency, a user list, and a timeout. You can also create query groups. These are simply labels that you can also assign to queries in order to direct them to a particular queue.

In order to provide you with additional control over the WLM (Workload Management) features of Redshift, you can now control how much of the available memory is used to process the queries in each queue. You can specify the desired values in the WLM section of the parameter group associated with the cluster:

Read more about defining query queues.

Key Rotation
As you may know, Redshift uses three tiers of encryption keys to help protect data at rest. The randomly generated AES-256 block encryption keys encrypt data blocks in a cluster. The database key encrypts the block encryption keys and is in turn encrypted using a master key encryption key.

You can now direct Redshift to rotate the encryption keys for encrypted clusters. As part of the rotation process, keys are also updated for all of the cluster’s automatic and manual snapshots. You cannot rotate keys for snapshots that do not have a source cluster.

The cluster state transitions to ROTATING_KEYS for the duration of the rotation process, and returns to AVAILABLE when it completes.

Read more about database encryption and rotating encryption keys.

HSM Support
You can now opt to store your master and database encryption keys in an HSM (Hardware Security Module). Devices of this type provide direct control of key generation and management, and make key management separate and distinct from the application and the database.

You can use an on-premises HSM or AWS CloudHSM. Either way, you will need to configure a trusted network link between the Amazon Redshift and the HSM using client and server certificates.

Read more about hardware security modules.

Database Auditing and Logging
Amazon Redshift logs information about connections and user activity related to your database, allowing you to monitor your cluster for security and troubleshooting processes. Customers can now choose to have these logs downloaded to Amazon S3 for secure and convenient access. Read the Database Auditing and Logging documentation to learn more.

SNS Notification
Redshift publishes notifications to Amazon Simple Notification Service (SNS) topics when events occur on Redshift clusters, snapshots, security groups, and parameter groups. There are four categories of notifications (Management, Monitoring, Security, and Configuration), two severity levels (Error and Info), and forty distinct types of events.

You enable these notifications by creating a subscription — a set of filters and an SNS topic in the AWS Management Console:

After you select a category, severity, and source type, the console displays the events that you will receive:

You can route the notifications to current or newly created SNS topics. You can also create email subscriptions to the topic at the same time:

Read more about event notifications.

Time to Shift

These features will be rolling out to all clusters and regions over course of the next two weeks!

— Jeff;

Jeff Barr

Jeff Barr

Jeff Barr is Chief Evangelist for AWS. He started this blog in 2004 and has been writing posts just about non-stop ever since.