Amazon Redshift Engineering’s Advanced Table Design Playbook: Compound and Interleaved Sort Keys

Part 1: Preamble, Prerequisites, and Prioritization
Part 2: Distribution Styles and Distribution Keys
Part 3: Compound and Interleaved Sort Keys (Translated into Japanese)
Part 4: Compression Encodings
Part 5: Table Data Durability

In this installment, I’ll cover different sort key options, when to use sort keys, and how to identify the most optimal sort key configuration for your tables. I’ll also present another methodology with which to work through your specific workload. This methodology offers concrete guidance on how to properly use sort keys for performance.

Defining a table with a sort key results in the physical ordering of data within each slice, based on the sort type and the columns chosen in the key definition. In Amazon Redshift, we allow for a table to be defined with compound sort keys, interleaved sort keys, or no sort keys. Each of these styles of sort key is useful for certain table access patterns. In practice, a compound sort key is most appropriate for the vast majority of Amazon Redshift workloads.

There are numerous benefits of ordering your data in Amazon Redshift:

You can reduce disk I/O by improving zone map effectiveness.
You can reduce compute overhead and I/O by avoiding or reducing cost of sort steps.
Improve join performance by enabling MERGE JOIN operation

Starting at the most fundamental question, and diving deeper, we can construct a logical flowchart applicable to any table that can be used for identifying the ideal sort type and sort key columns for your workload.

Will queries against my tables benefit from a sort key?

Almost always the answer is yes. However, in a few edge cases sorting a table doesn’t result in a performance improvement and only adds minor overhead to data ingestion.

As discussed preceding, you most often use sort keys to improve the effectiveness of the zone maps, which result in reduced block I/O for read operations. The next most frequent benefit provided by sort keys is sorting to remove, or reduce the cost of, sort steps required by SQL operations like ORDER BY, PARTITION BY, GROUP BY, and so on. Least common, but still important, is sorting to facilitate a MERGE JOIN operation. MERGE JOIN is the fastest of the three JOIN operations supported by Amazon Redshift.

If you have a table that is accessed in a pattern where none of these three optimizations benefits you, then you have one of the few scenarios where defining a sort key makes no difference. In these circumstances, you don’t need to specify a sort key.

Together, these steps can be reduced to “Does this table benefit from sorting?” Expanded, that question looks like the following:

o_redshift_tables_3_1

For tables that benefit from either a compound or interleaved sort key, you can ask additional questions to determine which type of key is most appropriate. These questions are covered in greater detail in the expanded flowchart here:

o_redshift_tables_3_2

Will sorting enable MERGE JOINs?

A MERGE JOIN will work best when the following criteria are met:

Two tables are sorted (compound) and distributed on the same columns.
Both tables are > 80% sorted (svv_table_info.unsorted < 20%))
These tables are joined, specifying the DISTKEY and SORTKEY columns in the JOIN condition.

You can use the following query to identify the tables that have a column that’s defined as both a DISTKEY and as a column within a compound SORTKEY:

SELECT * FROM admin.v_extended_table_info
WHERE table_id IN (
    SELECT DISTINCT attrelid FROM pg_attribute 
    WHERE attisdistkey = true AND attsortkeyord > 0
    MINUS
    SELECT DISTINCT attrelid FROM pg_attribute
    WHERE attsortkeyord = -1
);

The following query returns the number of queries that have these criteria:

Scan the table you’re reviewing, described by table_id
Scan a table with a column that’s defined as both a DISTKEY and as a column within a compound SORTKEY
Perform some type of JOIN operation

SELECT COUNT(*) num_queries FROM stl_query
WHERE query IN (
  SELECT DISTINCT query FROM stl_scan 
  WHERE tbl = [table_id] AND type = 2 AND userid > 1
  INTERSECT
  SELECT DISTINCT query FROM stl_scan 
  WHERE tbl <> [table_id] AND type = 2 AND userid > 1
  AND tbl IN (
    SELECT DISTINCT attrelid FROM pg_attribute 
    WHERE attisdistkey = true AND attsortkeyord > 0
    MINUS
    SELECT DISTINCT attrelid FROM pg_attribute
    WHERE attsortkeyord = -1)
  INTERSECT
  (SELECT DISTINCT query FROM stl_hashjoin WHERE userid > 1
  UNION
  SELECT DISTINCT query FROM stl_nestloop WHERE userid > 1
  UNION
  SELECT DISTINCT query FROM stl_mergejoin WHERE userid > 1)
);

If this query returns any results, you might be able to enable a MERGE JOIN for existing queries, without modifying any other tables. If so, from here you dive more deeply into those specific queries to identify if the JOIN condition contains the DISTKEY and SORTKEY column or not. The following query helps to expedite this investigation by checking JOIN conditions for these queries from STL_EXPLAIN:

SELECT query, TRIM(info)
FROM stl_explain 
WHERE info LIKE '% Cond:%'
AND userid > 1 AND query IN (
  SELECT DISTINCT query FROM stl_scan 
  WHERE tbl = [table_id] AND type = 2 AND userid > 1
  INTERSECT
  SELECT DISTINCT query FROM stl_scan 
  WHERE tbl <> [table_id] AND type = 2 AND userid > 1
  AND tbl IN (
    SELECT DISTINCT attrelid FROM pg_attribute 
    WHERE attisdistkey = true AND attsortkeyord > 0
    MINUS
    SELECT DISTINCT attrelid FROM pg_attribute
    WHERE attsortkeyord = -1
  )
  INTERSECT
  (SELECT DISTINCT query FROM stl_hashjoin WHERE userid > 1
   UNION
   SELECT DISTINCT query FROM stl_nestloop WHERE userid > 1
   UNION
   SELECT DISTINCT query FROM stl_mergejoin WHERE userid > 1))
ORDER BY query;

If this query returns no results, you need to tune multiple tables simultaneously to facilitate a MERGE JOIN operation.

Note: If a desired MERGE JOIN optimization requires reviewing and modifying multiple tables, you approach the problem in a different fashion than this straightforward approach. This more complex approach goes beyond the scope of this article. If you’re interested in implementing such an optimization, you can check our documentation on the JOIN operations and ask specific questions in the comments at the end of this blog post.

Do I want a MERGE JOIN?

MERGE JOIN operations are the fastest of the three join operations, so in many situations they’re what you want. However, for certain tables and query patterns it might be faster to use a HASH JOIN with a SORTKEY that benefits the filter condition rather than the join condition. For example, you might have a predicate that prunes a large number of blocks from scan and that is achieved by filtering on a sorted column unrelated to the JOIN condition column.

To quantify this, consider a use case satisfied by two different schemas:

Tables lineitem and orders are related on column orderkey
You calculate total sales of all items that had these criteria:
- Sold with a price over $104,500
- Shipped on the same day the order was placed

SQL code that achieves this goal is as follows:

SELECT SUM(l_price) 
FROM lineitem l 
JOIN orders o ON o.orderkey = l.orderkey 
AND o.o_orderdate = l.l_shipdate 
AND l.l_price > 104500;

Approach 1 uses a MERGE JOIN optimization:
- lineitem is distributed on orderkey and sorted on (l_shipdate, orderkey)
- orders is distributed on orderkey and sorted on (o_orderdate, orderkey)
Approach 2 uses a HASH JOIN with filter optimization:
- lineitem is distributed on orderkey and sorted on (l_price)
- orders is distributed on orderkey and sorted on (o_orderdate, orderkey)

Which of these approaches is most effective depends on the size of the orders table and the selectivity of the filter on l_price.

When the query runs faster with approach 1, it means that the MERGE JOIN is better than reducing the necessary I/O for the scan operation. However, if the query runs faster with approach 2, it suggests that the filter effectively reduces necessary I/O for the scan and works better than the MERGE JOIN optimization.

By understanding the value of each approach, you can better estimate when a query pattern is more optimized with a MERGE JOIN, and when tofocus on optimizing common filter conditions with a different SORTKEY.

Will sorting improve zone maps?

A zone map exists for each 1 MB block, and consists of in-memory metadata that tracks the minimum and maximum values within the block. This metadata is accessed before a disk scan in order to identify which blocks are relevant to the query. For an unsorted column, these min-max ranges can overlap from block to block, reducing the effectiveness of these zone maps. In almost every case, a sort key improves the effectiveness of zone maps, by physically ordering the data so values increase in ascending order through each column (and block) within each slice. However, there are some cases where a SORTKEY won’t improve zone maps:

Only one block with values exists on each slice for a given column.
- If there is only one block per column per slice, that block has a minimum and maximum value that remain consistent, regardless of whether the data within that block is sorted or unsorted.
- Example: A very small dimension table that contains less than a few thousand rows, each column’s values fitting entirely into the 1 MB block on each slice.
The values within the block are prefixed with a string 8 bytes or longer.
- The minimum and maximum values in the block header allow for 8 bytes of data to track each value. For long strings, we take the first 8 characters as a prefix.
- Example: I filter on a column that stores URLs that are always prefixed with: ‘https://’ so the minimum and maximum values are always the same prefix, regardless of the rest of the URL.
The column contains a single distinct value.
- If the minimum and maximum value are consistent, regardless of sorting order, there’s no need to a sort key on the column.
- Example: I have monthly time series tables where one of the columns contains the equivalent of DATE_PART(‘month ‘,timestamp).

These scenarios are rare, but if they match your use case it’s good to know that defining a sort key won’t improve zone map effectiveness.

Are query patterns using zone maps?

Sorting data to improve the effectiveness of zone maps results in a performance boost only if you’re also issuing queries that use those zone maps. The following are cases where range-restricted scans can’t effectively use zone maps:

CREATE TABLE IF NOT EXISTS "public"."orders"
 (...  
    ,"o_totalprice" NUMERIC(12,2) NOT NULL  
    ,"o_ordertime" TIMESTAMP NOT NULL
    ,"o_orderdate" DATE NOT NULL
 ...) DISTSTYLE ...
 SORTKEY ("o_ordertime", "o_totalprice");

-- Filter doesn’t reference ordered columns:

SELECT * FROM orders;
SELECT * FROM orders WHERE o_shippriority = 5;

-- Filter processes the sortkey column with function:

SELECT * FROM orders 
WHERE DATEADD('d',5,o_ordertime) > '2016-09-09 01:30:00';

-- Filter requires an implicit OR explicit cast

SELECT * FROM orders WHERE o_ordertime::date='2016-09-09'; -- Explicit
SELECT * FROM orders WHERE o_ordertime = '2016-09-09';     -- Implicit

-- Filter results in a non-selective range restriction. 

root@redshift/tpch=# SELECT MIN(o_ordertime) FROM orders;
    min
------------
 1993-01-01
(1 row)
SELECT * FROM orders WHERE o_ordertime > '1992-01-01 1:30:00';

If you avoid patterns such as this, range-restricted scans benefit from zone maps.

Do query patterns use repetitive filters on similar column groups?

Each table can have a single combination of columns defined as the sort key. Barring the materialization of duplicate tables sorted on differing criteria, you must know how your tables are accessed to use this table property most effectively for your workload. You can easily identify your table filter patterns by using the Amazon Redshift system tables, where this data is automatically logged for historical workloads. Using the query following gives insight into how your table is filtered:

SELECT 
    ti.schema||'.'||ti."table" AS tablename, 
    COUNT(DISTINCT i.query) AS number_queries,
    MAX(i.query) AS example_query, TRIM(info) AS filter
FROM stl_explain p
JOIN stl_plan_info i USING (userid,query,nodeid)
JOIN stl_scan s USING (userid,query,segment,step) 
JOIN svv_table_info ti ON ti.table_id = s.tbl
WHERE s.type = 2 AND s.userid > 1 
AND p.info LIKE 'Filter:%' AND p.nodeid > 0
AND s.tbl = [table_id]
GROUP BY 1,4 ORDER BY 2 DESC, 1;

If you identify a common pattern of filter criteria on a similar group of columns, it suggests a potential column combination for the sort key definition. If the patterns show filters that seem to be quite random, a single compound sort key definition might not benefit the cluster.

In some cases, a table is accessed with varying filter criteria. In certain of these circumstances, the filters are a function of one another and the sorting order of one column can benefit the zone maps of another column not defined in the sort key. For example, if I have columns txn_ts with value 2016-09-09 10:00:00 and txn_dt with value 2016-09-09, txn_dt is clearly a direct function of txn_ts. These are technically differing filter criteria, but they mutually benefit one another so can be neglected.

More significant are cases where two business units access the same tables in different ways. For example, say the marketing department accesses my table orders when interested in sale dates and customer demographics. My business use case for this is to perform complex analytics on various market segments, to run and assess the effectiveness of targeted advertising campaigns in near–real time. The operations manager of a distribution center might be interested in weekly metrics to track packages shipped. In this example, one use case clearly adds much more business value than the others.

Note: When investigating historical workloads using the system tables like this, pay attention to the patterns that emerge from the queries and operations that are important to your business. That is, prioritize optimizations on frequently recurring patterns issued by scheduled jobs and known workloads over attempting to optimize queries coming from exploratory ad hoc queries. The latter often consist of unoptimized SQL code that is insignificant in value to the business.

Can you VACUUM REINDEX when needed?

When using an INTERLEAVED SORTKEY, the sorted order of data is determined based on metadata called z-indexes and z-compressors as decribed in this Wikipedia entry. This metadata is built when initially loading data into an empty table with COPY or CREATE TABLE AS operations and can be rebuilt for a table using the VACUUM REINDEX command. If your load strategy doesn’t involve initially loading a significant portion of the data into the table with COPY or CTAS (such as a load with INSERT INTO SELECT), then you need to VACUUM REINDEX your tables to fully benefit from the INTERLEAVED SORTKEY defined.

VACUUM REINDEX can be a very expensive maintenance task, depending on the size of your table. If you’re not loading with COPY and can’t tolerate the cost of the VACUUM REINDEX operation, you shouldn’t use INTERLEAVED SORTKEYs.

Will sorting reduce sort steps?

It’s rare to make it through this flowchart to this point without benefiting. However, if you use ORDER BY or GROUP BY clauses on otherwise full table scans, it’s worth specifying a COMPOUND sort key to reduce sorting at read time.

Doing so allows you to skip, or reduce the cost of, certain query processing operations. Such operations include ORDER BY, GROUP BY query clauses, and PARTITION BY, ORDER BY clauses of a windowing aggregate function.

Determining the optimal SORTKEY column group

Once you determined that either a compound or interleaved sort key works for your table, next you identify which column group to specify. This step is easy, because you’ve already identified frequency of column filters and answered the question “Do query patterns use repetitive filters on similar column groups?” earlier. I provided this query to understand your filter criteria:

SELECT 
    ti.schema||'.'||ti."table" AS tablename, 
    COUNT(DISTINCT i.query) AS number_queries,
    MAX(i.query) AS example_query, TRIM(info) AS filter
FROM stl_explain p
JOIN stl_plan_info i USING (userid,query,nodeid)
JOIN stl_scan s USING (userid,query,segment,step) 
JOIN svv_table_info ti ON ti.table_id = s.tbl
WHERE s.type = 2 AND s.userid > 1 
AND p.info LIKE 'Filter:%' AND p.nodeid > 0
AND s.tbl = [table_id]
GROUP BY 1,4 ORDER BY 2 DESC, 1;

By pairing the results of this query with another query from the preceding post, you can make clear which columns to prioritize for your column group:

SELECT 
    ti."table", ti.diststyle, RTRIM(a.attname) column_name,
    COUNT(DISTINCT s.query ||'-'|| s.segment ||'-'|| s.step) as num_scans,
    COUNT(DISTINCT CASE WHEN TRANSLATE(TRANSLATE(info,')',' '),'(',' ') LIKE ('Filter:% '|| a.attname ||' %') THEN s.query ||'-'|| s.segment ||'-'|| s.step END) AS column_filters
FROM stl_explain p
JOIN stl_plan_info i ON ( i.userid=p.userid AND i.query=p.query AND i.nodeid=p.nodeid  )
JOIN stl_scan s ON (s.userid=i.userid AND s.query=i.query AND s.segment=i.segment AND s.step=i.step)
JOIN svv_table_info ti ON ti.table_id=s.tbl
JOIN pg_attribute a ON (a.attrelid=s.tbl AND a.attnum > 0)
WHERE s.tbl = [table_id] 
GROUP BY 1,2,3,a.attnum
ORDER BY attnum;

With this, you can easily visualize a complete view of how your table is filtered, and then define a column group to best facilitate your workload.

Next steps

Choosing optimal sort key configurations for your tables allows you to reduce disk I/O by improving zone map effectiveness, reduce compute overhead and I/O by reducing the cost of sort steps, and even enable MERGE JOIN operations. Each of these benefits help your complex analytical workloads to perform well over multipetabyte data sets.

By following the process detailed preceding, you can identify the ideal sort type and sort key column group for your specific tables. The final step is to simply rebuild the tables to apply these optimizations. This rebuild can be performed at any time. However, if you intend to continue working through the Advanced Table Design Playbook, you might want to wait until the end before you issue the table rebuilds. Otherwise, you might find yourself rebuilding these tables multiple times to implement optimizations identified in later installments.

In Part 4 of this table design playbook, I’ll describe how to use column properties related to compression encoding for another significant performance gain and storage reduction.