AWS Database Blog

Getting started with Amazon DynamoDB

Amazon DynamoDB is a key-value and document database purpose built for single-digit millisecond performance at any scale. It’s a fully managed, multiregion, multimaster database with built-in security, backup and restore, and in-memory caching for internet-scale applications.

This post reviews the fundamentals you need to know to get started. We’re going to create a table and learn about the primary considerations to make when designing a DynamoDB table. Next, we insert a few sample items and lastly query them. The sample items are from a real life example; the Amazon Customer Reviews Dataset from the Registry of Open Data on AWS. This guide walks you through how the business and technical requirements of the dataset informs DynamoDB design. This post assumes no prerequisite knowledge.

To start, let’s breakdown our use case. For this post, this dataset represents the production reviews for your ecommerce website, which serves millions of users a day. You likely want fast access to reviews for a specific product, and also want users to be able to look at all of the reviews they’ve posted.

Both of these conditions need fast access to large datasets and DynamoDB is the perfect solution.

Creating a table

The first step is creating a table.

  1. In the AWS Management Console, under Find Services, enter and select DynamoDB, as shown in the screenshot preview below.
  2. Choose Create table, as shown in the image below.
  3. For Table name, enter a name, such as product_reviews.

    You now have some important decisions to make. DynamoDB uses keys to organize and distribute your data across multiple instances. There are two basic key types:
    – Partition key – This is a value that DynamoDB uses for an internal hash function to determine the physical storage location of the item to be stored.
    – Sort key – This is used to sort items with matching partition keys. You can concatenate multiple values into a composite sort key.

    Choose a partition key that results in data being as evenly distributed as possible in order to avoid hot keys. For more information about partition key design and avoiding hot keys see Designing Partition Keys to Distribute Your Workload Evenly in the documentation. If you do additional reading on DynamoDB and see hash and ranges keys, these are earlier terms for partition and sort keys.

    DynamoDB tables require a primary key that uniquely identifies each item in the table. There are two types of primary keys:
     – Simple key – This is a partition key by itself.
     – Composite key – This is a partition key and a sort key.Whichever one you choose, each item stored must have a unique primary key.

    Let’s now take a moment to consider how our dataset and primary use case map to the primary key concepts we just laid out. The core use of this table is to maintain all customer reviews; your most common access pattern is retrieving all reviews for a single product.

    To retrieve a set of reviews for a product, first navigate to a product. You must have the product_id, which indicates that you want to query reviews based on that key. Secondly, when navigating to the review section, it would likely be convenient to sort in descending order by review_date.

    Each item in this dataset contains a customer_id, review_id, product_id, review_date, and review_body.

    You need a unique primary key to make sure that there are no primary key conflicts; use a composite sort key that includes review_id.

    To accomplish this, create your table with a partition key of product_id and a composite sort key of review_date and review_id. You could concatenate any GUID to ensure uniqueness.

  4. For Partition key, enter product_id.
  5. Select the Add sort key check box.
  6. For the sort key, enter review_date-review_id.
    The following screenshot demonstrates steps 3-6.
    Now we have a way to query reviews for a single product, let’s think about a secondary query we could perform against this dataset. You might want to search for all reviews from a single customer, or you may want to see what related products a customer liked.

    DynamoDB works best with denormalized table design. Denormalizing your tables allows for fast access to data by optimizing for multiple different access patterns. Just as you want to query for reviews for a specific product, some applications may need to perform queries using a variety of query criteria; for example, looking up reviews by a specific customer. To accomplish this, create a global secondary index (GSI). For more information, see Global Secondary Indexes.

    The GSI has a partition key of customer_id and a sort key of review_id.

    Optionally, to pre-sort the list by date, we could use a composite sort key with review_date.

  1. In Table settings, deselect the Use default settings check box.
  2. In Secondary indexes, choose Add index.
  3. In the Add index pop-up, for Partition key, enter customer_id.
  4. Select the Add sort key check box.
  5. Enter review_date.
  6. Leave Index name and Projected attributes as is.
  7. Choose Add index.
    The following screenshot demonstrates steps 9–13.
    The next step provisions your table’s capacity. There are two options. On-demand allows you to pay per request and not specify read or write capacity units. This is useful for tables with unknown workloads, unpredictable application traffic, or if you prefer the ease of pay-per-request models. Provisioned allows you to specify read and write capacity units as well as an auto scaling threshold.
  1. In Read/write capacity mode, select the Provisioned check box.
  2. In Provisioned capacity, for Read capacity units and Write capacity units, enter 5.
  3. In Auto Scaling, deselect the Read capacity and Write capacity check boxes.
    The following screenshot demonstrates steps 14–16.
    DynamoDB encrypts all of your data transparently, and offers three modes for encryption:
    – AWS owned CMK – DynamoDB owns the key (no additional charge). This is the default encryption type.
    – AWS managed CMKAWS KMS manages the key, which is stored in your account (KMS charges apply).
    – Customer managed CMK – The key is stored in your account and is created, owned, and managed by you. You have full control over the CMK (AWS KMS charges apply).

    This post uses the default setting.

  1. In Encryption At Rest, select the DEFAULT check box, as shown in the screenshot preview below.
  1. Select Create, as shown in the screenshot below.
    You see that the message Table is being created for a few minutes. After table creation is complete, you can insert reviews from your dataset.

Adding items to your new table

Next, we’re going to bulk insert reviews using a Lambda function from the Serverless Application Repository. This is a custom piece of code, but you can think of this as the API endpoint that could be used for your ecommerce website.

  1. Navigate to the Lambda console.
  2. Click Create function. You may have landed on a slightly different page than in the screenshot below. If you don’t see Create function, click Functions in the side bar.
  1. Select Browse serverless app repository.
  2. Enter getting-started-with-dynamodb-reviews-inserter into the search box.
  3. Select the Show apps that create custom IAM roles or resource policies check box.
  4. Click on the getting-started-with-dynamodb-reviews-inserter.
    The following screenshot demonstrates steps 3–6.
  1. Enter product_reviews for DynamodbDBTableName. This must match the name that we gave our table in Step 3 of the Creating a table section.
  2. Select I acknowledge that this app creates custom IAM roles.
  3. Click Deploy.
    The following screenshot demonstrates steps 7–9.
  1. On the next screen, you’ll see the message Your application is being deployed. Wait for the application to deploy.
  2. When you see Your application has been deployed in the green box, click ReviewInserter, as shown in the screenshot below. This redirects you to the deployed Lambda function.
  1. Scroll down to Function code section and double-click the reviews.json file in the file navigator on the left side to open it.
  2. Select the reviews.json tab in the code editor.
  3. Select and copy the entire contents of the file to your clipboard.
  4. In the top right, select the drop-down and then click Configure test events.
    The following screenshot demonstrates steps 12–15.
  1. Enter an Event name such as myTestEvent.
  2. Replace the existing event with the code you copied from reviews.json file to your clipboard.
  3. Click Create.
    The following screenshot demonstrates steps 16–18.
  1. Click Test, as shown in the screenshot below.
    This will take a few moments to run. We’ve now inserted our reviews into our DynamoDB table.

Querying your table

To query your table, complete the following steps:

  1. In the DynamoDB console, from the menu, choose Tables.
  2. Select the table you created.
  3. Choose the Items tab.
    You can see all of the product reviews that your Lambda function inserted.There are two ways to select data from a DynamoDB table in the console: scan and query. A scan searches over every single item in the table, which you don’t generally use for regular transactional workloads, but is helpful to see items on your table. A query allows you to enter search parameters based on the index that you are querying.

    The following screenshot shows the list of product reviews:
    The initial items that you see are a scan of the base table.

  4. From the actions drop-down, select Query.
  5. Select [Table] product_reviews: product_id, review_date-review_id.
  6. For product_id, enter B00JG8GOWU.
  7. Choose Start search.
    You can see reviews for the Kindle Paperwhite. The customer with id 38942812 left a positive review and you can see what other products they recommend.
  8. Switch to the GSI you created.
  9. From the drop-down menu, select Query.
  10. Select [Index] customer_id-review_date-index: customer_id, review_date.
  11. For customer_id, enter 38942812.
  12. Choose Start search.
    The following screenshot shows the options for your query:
    DynamoDB works best when you chose a partition key that results in evenly distributed queries across the entire table. You might encounter a situation in which demand for a particular key becomes excessively high. The product_title field shows that B00JG8GOWU is a Kindle Paperwhite. If this product went on sale, consumers might search for Kindle reviews more often than other products. For read-heavy workloads like your reviews dataset, use Amazon DynamoDB Accelerator (DAX), a fully managed caching service for DynamoDB.

    DAX is API-compatible with DynamoDB. This makes setting up DAX as easy as creating a cluster in the DynamoDB console, and changing your query endpoint to the DAX Cluster.

Cleaning up

Delete the Lambda function and DynamoDB table that you created.

To delete the Lambda function:

  1. Navigate to the CloudFormation console.
  2. Select Stacks.
  3. Select the stack serverlessrepo-getting-started-with-dynamodb-reviews-inserter.
  4. Click Delete.

To delete the DynamoDB table:

  1. Navigate to the DynamoDB Console.
  2. Select the table and choose Delete.

Summary

This post provided an overview of DynamoDB tables. You created a table, examined your dataset, and chose partition and sort keys. You set up GSIs to query data efficiently based on the patterns that your application needs. You also launched a Lambda function to populate your table with example reviews, learned the differences between scans and queries, and how to execute them.

DynamoDB is one of many tools in your database toolbox. When you need a highly scalable, millisecond-latency, key-value database to bring your internet-scale applications online, check out DynamoDB. When you’re ready to dive deeper, check out the video AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB (DAT401) on YouTube. For more advanced features, see Backup and Restore and Choosing the Right DynamoDB Partition Key.

 


About the Author

 

William Kalescky is a Solutions Architect with Amazon Web Services.