AWS Business Intelligence Blog
Best practices for Amazon QuickSight SPICE and direct query mode
Amazon QuickSight is a scalable, serverless, machine learning (ML)-powered business intelligence (BI) service. It supports data sources from Amazon Web Services (AWS), other cloud vendors, and on-premises. Some common data sources being used in QuickSight are Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon Redshift. See Supported data sources for more supported sources. After connecting QuickSight to a data source, you can create datasets on top of the data source. Each dataset can use either SPICE (Super-fast, Parallel, In-memory Calculation Engine) or direct query mode. Authors can then use datasets to create interactive dashboards, paginated reports, and Q topics to support natural language querying on top of their data.
In QuickSight, data is queried from datasets when visuals load within analyses, dashboards, reports, exports, in responses to questions asked in natural language to Amazon Q, or when threshold alerts are being evaluated. Direct queries are sent to the underlying data source every time a request is made. Using SPICE, a refreshable snapshot of the data is cached in QuickSight, and all queries are fulfilled using the latest snapshot in SPICE, no longer connecting to the underlying data source.
The following diagram illustrates the difference between QuickSight query modes. For example, when connecting to a Redshift data source, QuickSight can ingest the data from Redshift into SPICE or query it directly from Redshift.
In this post, we will explore the benefits and factors to consider when using SPICE and direct query mode. Afterwards, we will also discuss when and how to use which query mode most efficiently in different scenarios.
Benefits of using SPICE
SPICE is an in-memory calculation engine designed to help accelerate dashboard performance in QuickSight. Data can be ingested from the data source into SPICE based on a schedule configured through the user interface (UI), on-demand using the UI, or programmatically using refresh APIs. SPICE uses a combination of columnar storage, in-memory technologies enabled through the latest hardware innovations, and machine code generation to run interactive queries on large datasets and get rapid responses. When SPICE datasets are refreshing, dependent dashboards present data of the previous snapshot so that users can interact with the dashboard without interruption.
Alongside enhanced performance, SPICE helps reduce the workload on a data source by querying it only when data ingestion is required. SPICE usage is solely charged based on the capacity in gigabytes (GB) purchased to store data. The consumption of data from those datasets across any number of dashboards and users doesn’t incur charges. Each Author, Author Pro, Admin, and Admin Pro license includes 10 GB of SPICE capacity with the possibility to purchase additional capacity as required at an account level. When a SPICE dataset is accessed concurrently by a large number of users across their dashboards, SPICE automatically scales for and manages the entire workload to provide consistently fast response times to the entire user base. By shifting the workload from the data source to SPICE, you get the benefit of cost efficiency and improved dashboard performance.
In terms of security, data stored in SPICE is encrypted at rest. By default, the SPICE encryption key is managed by AWS. Alternatively, SPICE allows encryption of data at rest using customer managed keys stored in AWS Key Management Service (AWS KMS). This provides you with the tools to audit access to data and satisfy regulatory security requirements. You can immediately lock down access to your data by revoking access to AWS KMS keys. Data transfers from the data source to SPICE and from SPICE to the user interface are encrypted in transit by using Transport Layer Security (TLS).
Factors to consider when using SPICE
The data stored in a SPICE dataset is a snapshot of your data. To fetch the latest data from the source and refresh a SPICE dataset, you can configure a schedule or use an event-driven approach.
SPICE supports programmatic invocation of ingestion operation APIs. A common use case is to use these APIs to invoke a SPICE data refresh once extract, transform, and load (ETL) processes have been completed and data is stored in the data lake or data warehouse. An on-demand, full refresh can be initiated 32 times within a 24-hour period per dataset. This on-demand refresh process involves initiating a data ingestion either through the QuickSight UI or using the QuickSight API call CreateIngestion. An on-demand, incremental refresh can be initiated 100 times within a 24-hour period per dataset. A point to note here is that you cannot initiate an on-demand full refresh or incremental refresh after the limit has been reached. However, this doesn’t impact schedules that are configured; they will continue to run as planned. For more details about full and incremental refresh, see Keep your data fresh with incremental SPICE refresh.
SPICE accommodates up to 1 billion rows or 1 TB of data, whichever comes first for each dataset. Each field can store up to 2,047 Unicode characters, each column name can support 127 Unicode characters. For more details on data source quotas for SPICE, see data source quotas. As your data volume grows, it’s important to manage the growth proactively on your end. This will help ensure that SPICE limits aren’t exceeded during the data ingestion process, which could otherwise lead to data ingestion failures. To estimate SPICE size, see estimating the size of SPICE datasets.
Another important consideration is the total SPICE capacity, which is allocated at an account level. In the following screenshot, the remaining available capacity is limited to 162 GB. If this available capacity is consumed by either a single dataset or multiple datasets, this will lead to data ingestion failures. To prevent such failures, you can manually purchase additional SPICE capacity from the QuickSight UI as needed. Alternatively, you can enable the Auto-purchase capacity option, which allows SPICE to automatically acquire the necessary capacity and so avoid data ingestion failures.
Benefits of using direct query mode
Direct query mode allows near real-time querying of the data source. When a user opens or refreshes a dashboard, QuickSight sends the queries directly to the data source. The data source is then responsible for running the queries, and returning the results, which are immediately visualized in the dashboard. If the response time from a data source is fast, using direct query mode can provide the benefits of both the freshest data on permanently updated data sources and high-performing dashboards.
Direct query mode doesn’t have a limitation on the number of rows or the data size for a dataset. This makes direct query mode suitable for use cases where the data size exceeds the row or size limits of SPICE.
For slow running queries, you can use dataset parameters to filter the data and reduce its data volume, which help to accelerate query performance. When users interact with the dashboard, the selections and actions they make in controls, filters, and visuals can be propagated to the data sources through live, custom, parameterized SQL queries. See Optimize queries using dataset parameters in Amazon QuickSight for more details.
Factors to consider when using direct query mode
Because direct query mode sends queries in near real time to the data source when a user is accessing the dashboard, the performance will rely on the runtime at the data source. If the query takes a long time to run, the dashboard user will experience longer wait times for visuals to render. Generally, these wait times are longer when using direct query mode compared to using a SPICE dataset. To ensure a good user experience, it’s recommended to use a data source with fast response times when using direct query mode.
It’s important to note that QuickSight enforces a 2-minute timeout for generating a dashboard visual. This timeout applies to both direct query mode and SPICE. However, not all database drivers react to the 2-minute timeout and cancel the query. You might need to cancel the query in the data source either manually or programmatically. Direct query mode supports up to 127 Unicode characters for each column name and 2,000 columns for each dataset. Data source-specific timeouts will also apply and those vary for each data source type.
Finally, as the data volume grows, the query response time can slow down, requiring you to scale up the capacity of the source, which can incur additional costs. This is particularly important when dealing with a large number of frequent dashboard users, because the increased concurrency from the higher volume of queries in direct query mode can overwhelm the data source’s ability to handle the load.
Suggested query mode in different scenarios
This section discusses common scenarios for dashboards with varying latency requirements and describes how SPICE and direct query mode can be used to address their needs. Additional key factors to consider include data volume, data update frequency, and data source performance.
Dashboard data updates daily
It’s common that dashboard data needs to be updated daily to reflect yesterday’s information. In these cases, the data is typically extracted from source systems, transformed, and loaded into a data lake or data warehouse by batch ETL processes that are often run outside of business hours. If the data size is within SPICE quotas, the recommended approach varies based on how the data is updated in the source system. For scenarios where the individual records are updated in the source system, SPICE with a daily full refresh is a suitable choice. However, if the data is primarily appended, then SPICE with a daily incremental refresh would be a better option to handle the new additions of the data. If the ETL process is known to be completed at a certain time of the day, the data refresh in SPICE can be scheduled daily at a fixed time to align with the ETL process. On the other hand, if the ETL processing time is unpredictable, a more flexible approach can be used. In such cases, the SPICE refresh can be invoked programmatically using the QuickSight API after the ETL process has completed. To learn more, see Event-driven refresh of SPICE dataset in Amazon QuickSight.
Dashboard data updates every 15 minutes
If dashboard data needs to be more real-time and new data is appended with a data column showing the creation date and time, you can schedule a SPICE incremental refresh instead of a full refresh. The refresh interval can be as frequent as every 15 minutes.
If you need the dashboard data to be updated more frequently—such as every 10 minutes—while still using SPICE to enhance dashboard performance, you can combine the different data refresh approaches (full refresh, incremental refresh, and API refresh) on the same dataset. For instance, you can schedule incremental refresh for every 15 minutes and invoke the data refresh API between the scheduled incremental refresh time intervals. To provide the freshest data during business hours, all 32 API invocations that can be made within a 24-hour period could be used within the 8 business hours in which users will actually consume data through the dashboard.
The following table shows an example of refreshing a SPICE dataset by combining multiple refresh methods. In this example, data will be refreshed every 7 or 8 minutes between 9 AM and 5 PM, so you need to make sure the data refresh can finish in less than 7 minutes. After 5 PM, because there are no refreshes launched through the API, the dataset will follow the incremental refresh schedule and be updated every 15 minutes.
Time | SPICE refresh approach |
08:45 AM | Scheduled incremental refresh |
09:00 AM | Scheduled incremental refresh |
09:08 AM | Incremental refresh launched using API |
09:15 AM | Scheduled incremental refresh |
09:23 AM | Incremental refresh launched using API |
09:30 AM | Scheduled incremental refresh |
. . . |
. .. |
04:38 PM | Incremental refresh launched using API |
04:45 PM | Scheduled incremental refresh |
04:53 PM | Incremental refresh launched using API |
05:00 PM | Scheduled incremental refresh |
05:15 PM | Scheduled incremental refresh |
Figure 3 Timeline of combined SPICE refresh methods
Dashboard uses a mix of daily and near real-time data
A dashboard often shows different kinds of data with different dimensions and time granularity. For example, a dashboard landing page often shows high level key performance indicators (KPIs), monthly trends, and aggregated data for different dimensions. A second sheet then drills down into more details, providing a near real-time view at a transactional level. QuickSight supports the usage of multiple datasets in one dashboard. In this scenario, the dashboard landing page provides aggregated data with fast performance. You can create a SPICE dataset and schedule it to update daily. In the second sheet, data needs to always be fresh from the data source and you can set up a direct query mode dataset. You need to be aware that this approach might lead to data inconsistencies across different pages, because one page uses daily refreshed data while the other uses near real-time data. You can put a note in the dashboard to illustrate the data refresh data in a different page.
The following screenshot shows an analysis using a mix of SPICE and direct query mode datasets.
Dashboard data needs to be retrieved in near real-time from the data source
When dashboards provide access to business-critical operational data, they often must be displayed instantly on the visuals to reduce any unnecessary delay in taking crucial decisions and actions. Direct query mode is the best choice to display data in near real time, because QuickSight queries data for each visual directly from the data source every time the dashboard is loaded. This ensures that users always receive the latest data from the source.
Dashboard data from temporarily unavailable sources
If your data source is only running occasionally or your ETL process involves rebuilding tables, it can cause a temporary disruption and unavailability of data in the dashboards. To mitigate these effects, you can load the data into SPICE using the ingestion API when the source is available. This ensures a smooth and consistent dashboard performance because SPICE stores a snapshot of the data and decouples the dashboards from the source.
Conclusion
In this post, we discussed the strengths and considerations when using SPICE and direct query mode and how to choose a query mode in QuickSight depending on the scenario-specific requirements and circumstances. A key decision factor is the data latency requirement in the dashboard. After deciding on the right combination of query modes and refresh methods for your SPICE datasets, see Tips and tricks for high-performant dashboards in Amazon QuickSight to learn more.
About the Authors
Roy Yung is a Specialist Solutions Architect for Amazon QuickSight. Roy has over 10 years of experience implementing enterprise business intelligence (BI) solutions. Prior to AWS, Roy delivered BI and data platform solutions in the insurance, banking, aviation and retail industries.
Koushik Muthanna Koravanda Ganapathy is a Specialist Solutions Architect for Amazon QuickSight at AWS. He works with customers providing guidance on implementing QuickSight at scale so that they can get value from analyzing business data. Koushik enjoys spending time outdoors, travelling and swimming.