AWS Big Data Blog
Create advanced insights using Level Aware Aggregations in Amazon QuickSight
Amazon QuickSight recently launched Level Aware Aggregations (LAA), which enables you to perform calculations on your data to derive advanced and meaningful insights. In this blog post, we go through examples of applying these calculations to a sample sales dataset so that you can start using these for your own needs.
What are Level Aware Aggregations?
Level aware aggregations are aggregation calculations that can be computed at a desired level in the overall query evaluation order of QuickSight. Please check this link for details on QuickSight’s Order of Evaluation. Up until now, the only types of aggregations possible in QuickSight were Display-level and Table calculation aggregation types.
- Display-level aggregations are aggregations that are defined by the dimensions and metrics present in the field wells of a QuickSight visual.
- Table calculations are computed by windowing/rolling-up over the display-level aggregated values of the visual. Hence, by definition, these are calculated after the Display-level aggregations are computed.
With Level Aware Aggregations, QuickSight now allows you to aggregate values before the Display-level aggregation. For more information, please visit the Level Aware Aggregations documentation.
Customer use cases
Distribution of customers by lifetime orders
Customer question: How many customers have made one order, two orders, three orders, and so forth?
In this case, we first want to aggregate the total number of orders made by each customer, then use the output of that as a visual dimension. This isn’t feasible to compute without LAA.
Solution using LAA
1.) Compute the number of orders per customer.
Calculated field name : NumberOrdersPerCustomer
Calculated field expression : countOver({order_id}, [{Customer Id}], PRE_AGG)
This computes the number of orders per customer, before the display-level aggregation of the visual.
2.) Create the visual.
Create the visual with the above field NumberOrdersPerCustomer in the “X-Axis well of the Field Wells. Add “Count Distinct” of “Customer Id” in the “Value” section of the Field Wells to create a histogram on number of orders made by customers.
As we can see, there are around 5000 unique customers with one order, around 3500 customers with two orders, and so on.
Filter out customers based on lifetime spends
Customer Question: How do I filter out customers with life-time spend less than $100,000? My visual’s dimension (group by) and metric definitions are independent of total spend per customer.
If the group dimensions of the aforementioned aggregation(spend) is exactly the same as the group dimensions in the field well, the customer can achieve this using aggregated filters feature. But that’s not always the case. As mentioned in the customer question, the visual’s definition can be different from the filter’s aggregation.
Solution using LAA
1.) Compute sum of sales per customer.
Calculated field name : salesPerCustomer
Calculated field expression : sumOver(sales,[{customer_id}],PRE_AGG)
PRE_AGG indicates that the computation must occur before display-level aggregation.
2.) Create the visuals.
The visual on the left shows sum of sales per segment and the visual on the right shows the total number of customers. Note that there are no filters applied at this point.
3.) Create the filter on salesPerCustomer.
Create a filter on top of the above field salesPerCustomer to select items greater than $100,000.
4.) Apply the filter.
The above image shows applying the filter on “salesPerCustomer” greater than $100,000.
With the filter applied, we have excluded the customers whose total spend is less than $100,000, regardless of what we choose to display in the visuals.
Fixed percent of total sales even with filters applied
Customer Question: How much is the contribution of each industry to the entire company’s profit (percent of total)? I don’t want the total to recompute when filters are applied.
The existing table calculation function percentOfTotal isn’t able to solve this problem, since filters on categories are applied before computing the total. Using percentOfTotal would recalculate the total every time filters are applied. We need a solution that doesn’t consider the filtering when computing the total.
Solution using LAA
1.) Compute total sales before filters through a calculated field.
Calculated field name : totalSalesBeforeFilters
Calculated field expression : sumOver(sales,[],PRE_FILTER)
PRE_FILTER indicates that this computation must be done prior to applying filters.
The partition dimension list (second argument) is empty since we want to compute the overall total.
2.) Compute the fixed percent of total sales.
Calculated field name : fixedPercentOfTotal
Calculated field expression : sum(sales) / min(totalSalesBeforeFilters)
Note: totalSalesBeforeFilters is the same for every row of the unaggregated data. Since we want to use it post-aggregation, we are using the aggregation min on top of it. If all values are the same, max or avg aggregations can be used as well as it serves the same purpose.
3.) Create the visual.
Add “industry” field to “Rows” well. Add “sales (SUM)” and “fixedPercentOfTotal“ to the ”values“ section. Now, the percent of total metric would remain fixed even if we filter out the data based on any underlying dimension or measure.
The visual shows sales per industry along with percent of total, computed using the table calculation percentOfTotal and using Level Aware Aggregation as described above. Both the percent of total values are currently the same since there aren’t any filters applied.
The visual shows the same metrics but with industries filtered only to 5 of them. As we can see “Percent of total sales” got re-adjusted to represent only the filtered data, whereas “Fixed Percent of total sales” remains the same even after filtering. Both the metrics are valuable customer use cases now feasible through QuickSight.
Compare sales in a category to industry average
Customer question: How do I compare sales in a category to the industry average? I want the industry average to include all categories even after filtering.
Since we want the industry average to stay fixed even with filtering, we need PRE_FILTER aggregation to achieve this.
Solution using LAA
1.) Compute the industry average.
Calculated field name : IndustryAverage
Calculated field expression : avgOver(sumOver(sales,[{category}],PRE_FILTER),[],PRE_FILTER)
We first compute the sum of sales per category and then average it across all categories. It’s important to note here that we first computed a finer level aggregation and fed that into a coarser level aggregation.
2.) Compute the difference from IndustryAverage.
Calculated field name : FixedDifferenceFromIndustryAverage
Calculated field expression : sum(sales) – min(IndustryAverage)
As mentioned in one of the examples above, we use min aggregation to retain the data while going.
3.) Create the visual.
Create the visual by adding “Category” in “X axis” field well and SUM(Sales), IndustryAverage and FixedDifferenceFromIndustryAverage as the values in a bar chart.
Visual shows total sales per category, the average across all industries and each category’s difference from average.
This visual shows the same metrics, but with categories filtered to include only 6 of them. As we can see, the industry average remained the same before and after filtering, keeping the difference the same whether you choose to show all categories, some of them, or just one.
Categorize customers based on lifetime spend
Customer question: How do I classify customers based on cumulative sales contribution? I then want to use that classification as my visual’s grouping.
The objective here is create custom sized bins to classify the customer. Even though we could do this classification post display-level aggregation, we wouldn’t be able to use it as a dimension/group by in the visual.
Solution using LAA
1.) Compute sales per customer before display-level aggregation.
Calculated field name : salesPerCustomer
Calculated field expression : sumOver({sales amount},[{customer id}],PRE_AGG)
2.) Categorize Customers.
Calculated field name : Customer Category
Calculated field expression : ifelse(salesPerCustomer < 1000, “VERY_LOW”, salesPerCustomer < 10000, “LOW”, salesPerCustomer < 100000, “MEDIUM”, “HIGH”)
3.) Create the visual.
Create the visual by adding “Customer Category” to the “Y-axis” field well, “Count Distinct” of “customer id” to the value field well.
Above image shows the number of unique customers per Custom Category.
Filtering can be done on top of these categories as well to build other relevant visuals, since the categories are tagged before aggregation.
Above image shows the number of unique customers per custom category split by gender.
Availability
Level aware aggregations are available in both Standard and Enterprise editions, in all supported AWS Regions. For more information, see the Amazon QuickSight documentation.
About the Author
Arun Baskar is a software development engineer for QuickSight at Amazon Web Services.