AWS Database Blog

How to use DynamoDB global secondary indexes to improve query performance and reduce costs

Whether you’re an experienced Amazon DynamoDB user or just starting out, you’re probably familiar with the basics of the DynamoDB primary key model. This key model provides consistent single-digit millisecond performance at any scale. If your application needs efficient access to data using attributes other than the primary key, that’s where global secondary indexes come in handy. As an example, take querying an employee table that has a primary key of employee_id to find all employees in a particular department.

However, global secondary indexes are capable of more than multiple attribute queries—they’re actually one of the most versatile tools in DynamoDB. Recently, there’s been a limit increase on the maximum number of global secondary indexes per table from 5 to 20. Given that, now is a great time to learn how to use them to optimize your DynamoDB usage.

Think of a global secondary index as a separate DynamoDB table that contains a subset of attributes from the source table. It also contains an alternate primary key to support query operations. Changes made to the source table are propagated to the global secondary index with eventual consistency (similar to a materialized view in a relational database). This happens without affecting the performance or availability of the source table. For more information, see Global Secondary Indexes.

In this post, I demonstrate several ways to use global secondary indexes to query your data, accelerate your application’s performance, and reduce your monthly DynamoDB bill.

Usage patterns for global secondary indexes

Some common usage patterns for global secondary indexes follow.

Querying and sorting data by multiple attributes

Querying and sorting data by multiple attributes is the most common usage pattern when creating global secondary indexes. With this usage pattern, you create a global secondary index on an attribute that is not part of the table’s primary keys to support queries on that attribute.

Global secondary indexes support three types of queries:

  • Filtered data retrieval: Query data by the partition key and an optional sort key. You can create global secondary indexes on any table attributes with the filtering attribute as its primary key. Creating a global secondary index avoids scanning the full table and then filtering data.
  • Sorted data retrieval: Sort data on the sort key for global secondary indexes with both a partition key and a sort key. Global secondary index query results are always sorted by the sort key value. Any attribute requiring sorted data retrieval can be made the sort key of the global secondary index. The order of query results can be controlled using the ScanIndexForward  parameter in the query.
  • Range-based data retrieval: Include queries such as BEGINSWITH, BETWEEN, GREATER THAN, and LESS THAN on the sort key of the global secondary index. Global secondary index queries provide support for KeyConditionExpressions that enable range-based data retrieval.

Let’s say an online music streaming service has a Songs table storing information about all the songs the service offers. The information stored about a song includes SingerId, SingerName, Genre, LaunchYear, Album, SongDuration, NumberOfViews, and the NumberOfLikes of the song. The table’s partition key is SongId, which is a random string. The SongId uniquely identifies each song. The table doesn’t have a sort key because there is exactly one record per SongId. The following table shows the schema of the database table.

Table with the schema of the DynamoDB tableThe following table contains a list of queries needed for the music streaming service’s application. For each, it includes the candidate primary key attribute for the global secondary index and the rationale for using it.

Query Candidate Primary Key Attributes  for the
Global Secondary Index
Rationale
  Partition Key Sort Key  
Find all songs for a SingerId SingerId You don’t need a sort key because the partition key is sufficient to answer the query.

Global secondary indexes are optimized for uniform key distribution of items across its partitions.

Keeping SingerId as the partition key results in even key distribution across partitions. It also keeps all songs for a SingerId collocated in global secondary index partitions to make queries by SingerId efficient.

Find all songs of a genre ordered by number of likes Genre NumberOfLikes Because global secondary indexes store data in the order of the sort key, keeping NumberOfLikes the sort key lets you retrieve data sorted by NumberOfLikes.
Find most-viewed songs of a SingerId launched in 2010 SingerId LaunchYear:Views

Keeping SingerId as the partition key provides even distribution of data across global secondary index partitions.

Concatenating multiple attributes in the sort key allows querying data on multiple attributes. It also keeps the data sorted in the order of the concatenated sort key. In this case, using LaunchYear:Views as the sort key keeps data sorted by both year of launch and number of views.

Find all songs of a singer 2010-2014 SingerId LaunchYear SingerId as the partition key provides support for querying on it. Having LaunchYear as the sort key lets you do range queries on it using BETWEEN KeyConditionExpression.

Data filtering

Sometimes you need to filter the data you queried from your table or an existing global secondary index before using it in your application. In such cases you will incur the cost for reading all the items that you read before the filtering is applied.

The data filtering pattern is useful when the read cost of querying your table or using an existing global secondary index is higher than the write cost of maintaining an already-filtered view of the data as a new global secondary index.

Let’s say a food delivery service stores pending orders in a DynamoDB table called Orders. The table’s partition key is RestaurantId and its sort key is OrderId. The status of orders can be FoodInPreparation, ReadyForPickup, or OutForDelivery. The following table shows the schema of this DynamoDB table.

The delivery service needs to assign orders with the OrderStatus of ReadyForPickup so that delivery drivers know when orders are ready to be picked up. The table already supports querying all open orders for a restaurant by RestaurantId, which is the partition key with the filter clause of OrderStatus as ReadyForPickup.

Suppose that the delivery service queries all open orders every 30 seconds to minimize customer wait time. In this case, querying the table leads to significantly more items being read as opposed to maintaining a filtered view of orders that are in ReadyForPickup status. This is because a number of items will be in the FoodInPreparation and OutForDelivery states, which are being read from the table unnecessarily.

For this food delivery service, the following table includes possible global secondary index schemas the food delivery service can use to store a filtered view in the global secondary index along with the recommended schema.

Query

 

Candidate Primary Key Attributes for
Global Secondary Index
Rationale Recommendation
  Partition Key Sort Key    
Find orders for a RestaurantId with OrderStatus of ReadyForPickup RestaurantId ReadyForPickup

Keeping RestaurantId in the partition key of the global secondary index keeps the data well distributed across global secondary index partitions, which allows maximum throughput utilization.

Maintaining a separate flag for ReadyForPickup orders results in a sparse index and reduces the overall index size. An OrderStatus other than ReadyForPickup is not maintained in the global secondary index. This reduces read and write consumed units per second and storage cost on the global secondary index.

This is the recommended schema for the global secondary index. It results in well-distributed data across global secondary index partitions and also reduces read and write costs by storing filtered data.
RestaurantId OrderStatus

This schema provides the same benefit of data distribution as the previous global secondary index partitioning approach.

Keeping the sort key as the OrderStatus maintains data for all the statuses. Thus, you increase write capacity units and storage cost on the global secondary index.

 

Not recommended because of higher write capacity units consumption and storage cost.
OrderStatus RestaurantId Keeping OrderStatus as the partition key stores data across restaurants with the same order status collocated in a partition. This approach results in hot spots while accessing data for different restaurants simultaneously. Having heavily requested hot spots leads to inefficient use of provisioned read and write capacity, ultimately resulting in throttling. Similar to the previous approach, this partitioning scheme also requires more write capacity units and storage cost because all order statuses are being maintained in the global secondary index. Not recommended because of poor data distribution across global secondary index partitions.

Reduce read capacity unit (RCU) consumption for large items

In some cases, the items in your table might have a large number of attributes but the application query requires reading only a small subset of these attributes. In these cases, querying data from the table consumes more read capacity units because of the large item size. This pattern is useful with read-heavy workloads of a particular type. Such read-heavy workloads occur when the price of querying data from the table is much higher than the price of maintaining only required attributes in a separate global secondary index.

This pattern is particularly useful with count queries. You use such queries to find the total number of sort keys under a partition key. In these queries, you aren’t interested in the data of each item, but only need a count of the total number of items. Maintaining a separate keys-only global secondary index in such cases significantly reduces the per-item size being read.

Let’s say online shopping platform stores product information in a DynamoDB table called Products. The shopping platform displays these products on its website by querying this table. The partition key of the table is ProductId. The ProductId uniquely identifies a product, so the table doesn’t have a sort key. Each item in the table is written once but read a hundred times. This makes it a read-heavy workload.

The following table shows the schema of the DynamoDB table.

Table with the schema of the DynamoDB table

The size of each item in the table is 100 KB, but to display the preview of the product on the website, the company needs only a few attributes — ProductName, ProductDescription, and Price. These add up to less than 4 KB. The Products table already supports accessing the required attributes. However, it consumes several times the read capacity units per item compared to maintaining a global secondary index with only the fields to be displayed on the website. Having a separate global secondary index in this case helps deliver a significant reduction in read costs.

Isolating read workloads

In this usage pattern, you create the global secondary index on the same primary keys as the table. However, the read capacity units are provisioned separately to cater to the specific reader. Use this pattern when the data in your table has multiple access patterns and isolation is required to prevent the access patterns from affecting each other.

One common scenario here includes data shared between high-priority user requests and low-priority offline processing. To avoid low-priority offline requests from throttling high-priority user requests on the table, you can offload offline requests to scan and query the global secondary index instead of the table.

Commons use cases and scenarios

You can find some common use cases and scenarios for global secondary indexes following.

Internet of Things (IoT) applications

AWS customers deploy IoT applications across a range of use cases, including connected homes, healthcare, connected vehicles, industrial machines, and retail. Most of these applications share a common pattern of data ingestion, and they require real-time analytics and monitoring capabilities. By using global secondary indexes, you can monitor IoT applications in real time with minimal management.

As an example, consider an industrial application that runs in a number of buildings, each with hundreds of IoT sensors. Each sensor senses data across various dimensions and updates the data in a DynamoDB table. The table stores the latest value of each metric.

Table storing the latest value of each metric

This IoT application needs real-time monitoring and alarming if any of the metrics breaches its configured threshold. To achieve this, create a global secondary index on each of these dimensions and periodically query the index for values greater than the threshold for each metric. You can create a global secondary index with BuildingId as the partition key for each global secondary index and the metric name as the sort key.

In some cases, the IoT application might need to start measuring other metrics such as Speed and need an alarm if the machines are running too fast. In these cases, you can add the metric as a table attribute and create a global secondary index on it.

Advertising technology

You can use global secondary indexes in a number of advertising technology use cases such as ad targeting, real-time bidding (RTB), and ad personalization. DynamoDB provides the performance and availability required for these use cases.

Let’s look at an RTB use case to see how you can solve it by using global secondary indexes. RTB requires making optimum decisions to determine the value of each ad before bidding on it. Typically, the time between calculating bids and bidding is 100 milliseconds. You can use DynamoDB to store user profile information that includes the categories to which a specific user belongs as well as the times the user was assigned a given segment.

You can use this user information as part of bidding-decision logic. By using a global secondary index with User Id as the partition key and Category as the sort key, you can query user data in real time by the category of the ad.

Gaming

Speed and scalability are crucial aspects of gaming applications. These applications typically require single-digit millisecond reads and writes to avoid latency during game play. You can use global secondary indexes to handle such scenarios.

Global secondary indexes can help to create a highest-score leaderboard, social graphs for players, and in chat messages. The ease of creating and removing a global secondary index when a new query arises allows rapid development and reduces time to market.

Summary

Global secondary indexes enhance the querying capability of DynamoDB. This post shows how you can use global secondary indexes along with patterns such as data filtering and data ordering to achieve read isolation and reduce query costs. The recent limit increase of the maximum number of global secondary indexes per DynamoDB table from 5 to 20 can help you apply these usage patterns without worrying about hitting limits.

If you have questions about how to get the most out of global secondary indexes, comment below or tweet us at @DynamoDB.

 


About the Author

Shubham's photoShubham Sethi is a software development engineer at Amazon Web Services.