Speed up your ELT and BI queries with Amazon Redshift materialized views
The Amazon Redshift materialized views function helps you achieve significantly faster query performance on repeated or predictable workloads such as dashboard queries from Business Intelligence (BI) tools, such as Amazon QuickSight. It also speeds up and simplifies extract, load, and transform (ELT) data processing. You can use materialized views to store frequently used precomputations and seamlessly use them to achieve lower latency on subsequent analytical queries.
This post demonstrates how to create a materialized view, refresh it after data ingestion, and speed up your BI workload.
Setting up your sample datasets
This walkthrough uses the Amazon Customer Reviews Dataset. It is a public dataset stored in the
us-east-1 Region. You will create the following three tables:
- product_reviews – Contains customer reviews for a specific product
- customer – Contains customer profile data
- customer_address – Contains customer address information
The following diagram shows the relationship of the three tables.
To download the script and set up the tables, choose mv_blog.sql.
Creating and using materialized views
For this use case, your marketing team wants to build a report that shows how many customers per state like your products. You also want to drill down to each product category when needed.
In this first step, you create a regular view. See the following code:
The following code is a report to analyze the product review count per state:
The following code is a report to analyze the product review count per state for specific categories:
The preceding reports take approximately 4 seconds to run. As you sell more products and get more reviews, this elapsed time gradually gets longer. To speed up those reports, you can create a materialized view to precompute the count of reviews per product category and per state. See the following code:
The following code are the reports to analyze the product review against the materialized view.
The same reports against materialized views take less than 200 milliseconds because the new queries access precomputed joins, filters, grouping, and partial sums instead of the multiple, larger base tables.
Speeding up and simplifying ELT data processing
To achieve similar performance without the use of materialized views, many users use the CREATE TABLE AS (CTAS) command. However, as you update base tables with new data inserts, updates, or deletes, the CTAS tables become stale; you must recreate them to keep them up-to-date with the latest changes from the base tables. Now with Amazon Redshift materialized views, you can overcome this problem by efficiently and incrementally refreshing the materialized views with supported SQL. For example, the following code ingests another 10,000 reviews:
Now the materialized view is out-of-date. To refresh the materialized view, enter the following code:
Within 200 milliseconds, the materialized view is up-to-date again. Your report queries have the same consistent, fast performance.
The following screenshot is the query log that shows query performance. The log shows newer statements at the top.
The materialized views refresh is much faster because it’s incremental: Amazon Redshift only uses the new data to update the materialized view instead of recomputing the entire materialized view again from the base tables. For more information, see REFRESH MATERIALIZED VIEW.
Materialized views also simplify and make ELT easier and more efficient. Without materialized views, you might create an ELT job and use CTAS to precompute the product analysis data. The ELT job recomputes this data after new data is ingested and stores the data in the precomputed product analysis table to meet the dashboard latency requirement.
In particular, the ETL job drops and recreates the precomputed product analysis table after each ingestion. See the following code:
With materialized views, you just need to create the materialized view one time and refresh to keep it up-to-date. To refresh materialized views after ingesting new data, add
REFRESH MATERIALIZED VIEW to the ELT data ingestion scripts. Redshift will automatically and incrementally bring the materialized view up-to-date.
Achieving faster performance for BI dashboards
You can use materialized views to help your BI team build a dashboard to analyze product trends.
For example, to create a materialized view to join
customer_address dimension tables and precompute reviews and ratings, enter the following code:
You access materialized views the same as you do a regular table. For this walkthrough, choose a materialized view as the source for an Amazon QuickSight dataset. As showing by the following screenshot.
You can preview data of the materialized view in Amazon QuickSight to understand what information can be used to build the dashboard. The following screenshot shows the sample data of mv_product_analysis.
To track how many reviews customers post over time, use
review_date as the X-axis and
Sum(review_total) as the Y-axis. The following graph shows this visualization.
The following screenshot shows a complete dashboard “Product trend” that analyzes the top product category, product popularity by state, and more.
Because you are using materialized views, the product trend dashboard loads in seconds and is always up-to-date. You can gain the latest insights, understand customer purchase behavior, and identify business opportunities and optimizations.
You can compare the performance of materialized views with other possible alternatives, such as using regular views and using CTAS. The following graph shows the overall query execution for the product trend dashboard. Materialized views not only improve query performance by more than an order of magnitude compared to using a regular view, but also have low maintenance costs compared to using a CTAS because the incremental refresh time is proportional to the delta of changes in the base tables. In contrast, the CTAS recreate approach needs to processes all the data in the base tables.
The following animated gif shows the actual response time for the product trend dashboard built using Amazon QuickSight in direct query mode.
This post showed how to create Amazon Redshift materialized views with one or more base tables to speed up both BI queries and ELT. You can easily build and maintain efficient data processing pipelines and seamlessly extend the low latency query execution benefits of materialized views to data analysis.
About the Authors
Juan Yu is a Data Warehouse Specialist Solutions Architect at AWS.
Jose Kunnackal John is principal product manager for Amazon QuickSight, AWS’ cloud-native, fully managed BI service. Jose started his career with Motorola, writing software for telecom and first responder systems. Later he was Director of Engineering at Trilibis Mobile, where he built a SaaS mobile web platform using AWS services. Jose is excited by the potential of cloud technologies and looks forward to helping customers with their transition to the cloud.