Amazon Redshift Introduces Late Materialization for Faster Query Processing
Amazon Redshift now uses late materialization to reduce the amount of data scanned and improve performance for queries with predicate filters.
Late materialization (row-level filtering) reduces I/O for queries with filters by batching and factoring in the filtering of predicate(s) before fetching data blocks in the next column. For example, consider a query with filters on two columns operating on a CUSTOMER_DETAIL table having a million rows:
With late materialization, Redshift fetches a batch of data from the CUSTOMER_SINCE_DATE and CUSTOMER_STATUS_LEVEL columns, then applies the respective predicates. If only 10% of the CUSTOMER_DETAIL table rows satisfy the predicate filters, Redshift can potentially save 90% of the I/O for remaining columns, thus improving query performance. Also, by scanning columns in batches rather than a row at a time, performance is further improved.
With this release, we have added a new ‘is_rlf_scan’ column to the STL_SCAN table. If a query uses late materialization, then this column will have a value of 't' or TRUE; otherwise, it will have a value of 'f' or FALSE.
Amazon Redshift enables late materialization by default. It is automatically triggered for a query when Redshift detects that the query can improve performance. With the latest build 1.0.1583, late materialization is available in all AWS regions.