Amazon Redshift Introduces Late Materialization for Faster Query Processing

Posted on: Dec 21, 2017

Amazon Redshift now uses late materialization to reduce the amount of data scanned and improve performance for queries with predicate filters. 

Late materialization (row-level filtering) reduces I/O for queries with filters by batching and factoring in the filtering of predicate(s) before fetching data blocks in the next column. For example, consider a query with filters on two columns operating on a CUSTOMER_DETAIL table having a million rows: 

With late materialization, Redshift fetches a batch of data from the CUSTOMER_SINCE_DATE and CUSTOMER_STATUS_LEVEL columns, then applies the respective predicates. If only 10% of the CUSTOMER_DETAIL table rows satisfy the predicate filters, Redshift can potentially save 90% of the I/O for remaining columns, thus improving query performance. Also, by scanning columns in batches rather than a row at a time, performance is further improved. 

With this release, we have added a new ‘is_rlf_scan’ column to the STL_SCAN table. If a query uses late materialization, then this column will have a value of 't' or TRUE; otherwise, it will have a value of 'f' or FALSE. 

Amazon Redshift enables late materialization by default. It is automatically triggered for a query when Redshift detects that the query can improve performance. With the latest build 1.0.1583, late materialization is available in all AWS regions.