How to determine if Amazon DynamoDB is appropriate for your needs, and then plan your migration
AWS CTO Werner Vogels often jokes that AWS is in the business of “pain management for enterprises,” which gets to the root of many of the IT challenges AWS customers face. Simply asking “Where can we provide customers the most benefit?” often results in a discussion of databases and related license costs, performance and scalability management, and the cost of specialized database administrator labor. A fully managed NoSQL database service such as Amazon DynamoDB can address these challenges and provide many benefits.
In this blog post, I explain how to evaluate DynamoDB as a potential database service for your application, and then explain how to plan your migration to DynamoDB, including best practices.
At AWS, we believe in database freedom. We encourage you to evaluate and choose the right database for the type of data, access model, and scalability that you require. We also believe that your developers and system administrators should be free to build and operate their applications while leaving the operations management to AWS. This is particularly important for high-scale database operations.
Cloud-native applications are often architecturally different from legacy applications, and they are deployed regularly with loose coupling and multiple, auto-scaling compute tiers. It is important to take the time to consider your options and test-drive them as needed. Develop a belief in polyglot persistence—pick the right database technology for both your cloud-native and legacy applications. Resist the temptation to use the same database approach for every application development challenge.
AWS makes choosing and tuning the right database simpler and less expensive than commercial database engines by providing both ease of experimentation and transparent pricing. The available AWS database options are growing quickly in number and capability, driven in part by the adoption of microservices, the Internet of Things, and requirements for many types of specialized analytics. You can launch and manage most open source or third-party database engines through the AWS Marketplace and other repositories, and operate them on the appropriately sized Amazon EC2 instances or by using Amazon Relational Database Service (Amazon RDS).
Or you can let AWS handle the operations work and avoid database administrator costs by using a fully managed service such as DynamoDB. Werner Vogels observed that approximately 70 percent of Amazon.com database needs did not require a relational model and could be served better using a key-value store. This is why we conceived and built DynamoDB. DynamoDB is often used for low-scale operations because of its simplicity, but it also excels at ultrahigh-scale operations such as those demanded by Amazon.com.
Is DynamoDB right for your use case?
You should consider using DynamoDB if you:
- Have had scalability problems with other traditional database systems.
- Are actively engaged in developing an application or service. It doesn’t always make sense to migrate legacy applications that are not under development, unless you’re willing to invest time and effort to reimplement the data access layer, inline SQL code, or the stored procedures and functions of that application.
- Are working with an online transaction processing (OLTP) workload. High-performance reads and writes are easy to manage with DynamoDB, and you can expect performance that is effectively constant across widely varying loads.
- Are deploying a mission-critical application that must be highly available at all times without manual intervention.
- Are understaffed with respect to management of additional database capability and need to reduce the workload of your operations team.
- Require a high level of data durability, regardless of your backup-and-restore strategy.
- Have insufficient data for forecasting peaks and valleys in required database performance.
DynamoDB suitability guidelines
Before deciding to use DynamoDB, you should be able to answer “Yes” to most of the following evaluation questions:
- Can you organize your data in hierarchies or an aggregate structure in one or two tables?
- Is data protection important?
- Are traditional backups impractical or cost-prohibitive because of table update rate or overall data size?
- Does your database workload vary significantly by time of day or is it driven by a high growth rate or high-traffic events?
- Does your application or service consistently require response time in the single milliseconds, regardless of loading and without tuning effort?
- Do you need to provide services in a scalable, replicated, or global configuration?
- Does your application need to store data in the high-terabyte size range?
- Are you willing to invest in a short but possibly steep NoSQL learning curve for your developers?
Some unsuitable workloads for DynamoDB include:
- Services that require ad hoc query access. Though it’s possible to use external relational frameworks to implement entity relationships across DynamoDB tables, these are generally cumbersome.
- Online analytical processing (OLAP)/data warehouse implementations. These types of applications generally require distribution and the joining of fact and dimension tables that inherently provide a normalized (relational) view of your data.
- Binary large object (BLOB) storage. DynamoDB can store binary items up to 400 KB, but DynamoDB is not generally suited to storing documents or images. A better architectural pattern for this implementation is to store pointers to Amazon S3 objects in a DynamoDB table.
If after reviewing these considerations you decide that DynamoDB is suitable for your needs, you are ready to walk through the following migration planning section.
DynamoDB migration planning
When migrating a traditional database workload to DynamoDB, you should consider implementing an initial proof of concept. You also can run systems in parallel during a test phase to identify all of the variables not known during the planning phase. An iterative, agile approach is best. For a more detailed description of the migration planning steps in this section, see Best Practices for Migrating from an RDBMS to Amazon DynamoDB.
For simplicity, I assume that you’re migrating from an on-premises RDBMS to DynamoDB. However, other migration cases can also follow these steps.
1. Developer training
When migrating a data-driven system from a data center to the cloud, your first step is retraining your developers so they can make the switch from using embedded SQL in their code to making API calls to a NoSQL system such as DynamoDB.
Set aside training time for your developers, using the content in the Amazon DynamoDB Developer Resources. The developer resources include self-paced labs, customer examples, and high-quality training. A typical developer takes a few days to come up to speed on DynamoDB and needs a few more days to experiment and get their development tools set up for efficient coding and unit testing.
Note that DynamoDB local is a downloadable version of DynamoDB that your development team can use to write and test applications without accessing the DynamoDB web service. After your code is complete, you then make only a few changes for it to run against the web service. Using this approach also allows you to avoid the cost of using DynamoDB for development and testing.
Your development team should also explore the use of local and global secondary indexes to support query optimization across the name-value pairs in your migrated tables. Global secondary indexes function transparently like external AWS-managed DynamoDB tables, and using them incurs additional cost. Use global secondary indexes only when necessary.
2. Data conversion
Depending on size and storage considerations, you should consider converting your existing tables to a single table in the source database before migration, to save time and effort later in the migration process. You also can construct a database job that outputs data on a periodic basis for migration to AWS. Review the tips for denormalization in First Steps for Modeling Relational Data in DynamoDB, which include relational modeling for your review.
To make the conversion process easier, third-party tools from several AWS partners are available in the AWS Marketplace for piping your data from an RDBMS and other sources into DynamoDB. AWS Marketplace tools have the advantage of being consumable by the hour, sending the billing events to your AWS invoice. When you are done using the AWS Marketplace solution, you can terminate its use and stop paying for the resource.
3. Data migration
When the denormalized table or exported source data is ready, consider whether the data will be migrated all at once in a batch, or in a batch with a last-minute synchronization step before you switch from the migration source to the target. AWS Database Migration Service (AWS DMS) treats DynamoDB as a migration target, with the source being a supported relational database, or Amazon S3 or MongoDB. Many of the migration scenarios that AWS DMS supports also can use AWS Snowball to move terabytes of data as an intermediate step. With AWS DMS, you pay only the cost of the Amazon EC2 instance that you use during the data movement and replication process. Most DynamoDB migration projects move data using AWS DMS, and then switch over after a period of replication and testing.
A general rule is that if your data migration takes more than one week given the bandwidth available for data movement, you should consider using Snowball for your initial migration. Snowball is a petabyte-scale data transport solution that uses devices designed to be secure to transfer large amounts of data into and out of the AWS Cloud. Using Snowball addresses common challenges with large-scale data transfers including high network costs, long transfer times, and security concerns. With Snowball, you move the data to the device on premises, which is then securely transported to an AWS data center. Its content then is loaded into Amazon S3, which you can then import into DynamoDB by using AWS Data Pipeline or AWS DMS in the manner previously described.
4. Consistency model
One consideration for your workload is the required read consistency model following each table update. DynamoDB writes table updates in three Availability Zones in each AWS Region for durability, and all of your data usually is written in all locations in one second or less.
DynamoDB supports three models of read consistency: eventually consistent, strongly consistent, and transactional. Unless you specify a different consistency model for your application, DynamoDB uses eventually consistent reads. If your application and users can accept eventually consistent reads where a retrieval response may occasionally produce stale data based on timing, you can provision less capacity and your costs will be the lowest of the three consistency choices.
If you select strongly consistent reads, DynamoDB returns a response with the most up-to-date data from all successful write operations, and your provisioned capacity and costs are higher.
For transactional applications where the last write must always be available for all requests, the transactional APIs are the best choice. To support transactions and simplify the developer experience of making all-or-nothing table changes, DynamoDB performs two underlying reads or writes of every item in the transaction: one to prepare the transaction and one to commit the transaction. Although the cost of each read and write is the same, this drives up the total number of reads and writes for any given transactional change, and therefore is the most expensive table update option and read consistency model.
5. Security – encryption
If you use the migration approach recommended in this post, your data is encrypted in transit during the migration. When you create a new table in DynamoDB, encryption at rest is enabled by default. DynamoDB encryption at rest uses 256-bit Advanced Encryption Standard (AES-256), which helps protect your data from unauthorized access. The encryption at rest mechanism integrates with AWS Key Management Service (AWS KMS) for management of the encryption key that is used to encrypt your tables. This functionality eliminates the operational burden and complexity involved in protecting sensitive data. There is no performance reduction or cost burden associated with encryption.
When you create a new table, you choose one of two encryption options to create and use a customer master key (CMK) in AWS KMS for table encryption purposes. The first option is an AWS-owned CMK – default encryption type. With this default option, DynamoDB uses a single service CMK to encrypt your tables. If this key doesn’t exist, it is created and managed for you by DynamoDB at no extra charge, and it cannot be disabled.
The second encryption option is called AWS managed CMK. If you choose this option, the CMK is created and stored in your account and is managed by AWS KMS (AWS KMS charges apply). With this option, you can view the CMK and its key policy in your AWS account, but you cannot change the key policy that was created for its use by DynamoDB. This second option has significant audit and security oversight value because you can review the encrypt and decrypt operations on the DynamoDB tables that you create by viewing the API calls made to AWS KMS in AWS CloudTrail logs.
By default, communications to and from DynamoDB use the HTTPS protocol, which protects network traffic by using SSL/TLS encryption, including those communications from the AWS Command Line Interface and AWS SDK.
6. Network security – VPC endpoint for DynamoDB
Most AWS customers will access DynamoDB from their VPC by using a secure VPC endpoint for DynamoDB. This alleviates security concerns about having to connect to DynamoDB from a private subnet via a network address translation gateway, or from a virtual private gateway or internet gateway. For optimum security in transit between your application tiers and DynamoDB, you should plan to enable VPC endpoints for your DynamoDB instance.
7. Performance – throughput and auto scaling
Very little management effort is required of you to manage the performance of DynamoDB. The service performs as expected with little variation in response time based on your throughput settings. You can use several of the newer capabilities of the service to automate planning for performance and alleviate potential errors in setting your throughput settings.
First, use the Throughput Capacity for Reads and Writes guide. More capacity has the potential to drive your cost of capacity higher than you want, and lower than needed capacity can result in decreased application performance.
Second, you should consider enabling auto scaling for your table so that you do not have to be concerned about incorrectly setting your read capacity units and write capacity units. With auto scaling, AWS moves capacity up and down based on the performance you need over time. You can set limits to scaling based on cost considerations, and then use Amazon CloudWatch and the DynamoDB console to watch the automation of auto scaling and be certain that you have not set the auto scaling upper limit too low. You should not have to manage auto scaling over time using this approach.
You also might decide to switch to DynamoDB on-demand capacity mode and allow AWS to manage all scaling activities for you.
8. Required performance – microseconds versus milliseconds?
If your workload requires extremely high performance with a response time measured in microseconds rather than single-digit milliseconds, you may want to test Amazon DynamoDB Accelerator (DAX). You can use DAX in-memory acceleration to lower the required DynamoDB throughput capacity of a given application that is read heavy, therefore also lowering the cost of DynamoDB operations in most cases.
9. Reliability considerations
After you migrate your data to DynamoDB, you should protect your table’s contents with backups so that you can recover from data loss events. Because DynamoDB is a regional service, you should also consider being able to recover from any DynamoDB service failure.
With respect to traditional data protection, DynamoDB supports point-in-time recovery (PITR). Unless you can protect your table contents otherwise or reconstruct it on demand from a fixed location such as Amazon S3, you should enable PITR to protect your table data from application-driven corruption or user error. If you must recover a table, it is recovered in the form of a new table, and you have to reestablish throughput settings and auto scaling limits and related settings.
DynamoDB also supports on-demand backup and restore so that you can protect your data separately on a schedule or as needed. There is no performance impact on your tables, and the backup process is completed in seconds. Backups are cataloged and stored for retrieval. These backups also can be used to protect against inadvertent table deletion because they are preserved regardless of table status.
10. Regional resiliency
To deploy your application across multiple AWS Regions or on a worldwide basis, you should consider using global tables. You can use global tables to deploy your DynamoDB tables globally across supported regions by using multimaster replication. It’s important to follow global tables best practices and to enable auto scaling for proper capacity management. Note that strongly consistent reads can be used only in a single region among the collection of global tables, where eventually consistent reads are the only type of reads available.
11. Optimizing capacity and spending
After you migrate your data, it’s important for you to optimize your use of DynamoDB. Review the following items for tips on optimizing your capacity and spending on DynamoDB.
AWS recently announced a capability called on-demand capacity mode. When you use on-demand, you allow AWS to adjust your read and write capacity to match peaks and valleys in required throughout, without having to create and maintain an initial capacity and auto scaling model. However, if your application has relatively stable capacity, on-demand may not lower your cost of operation.
It’s important for you to understand DynamoDB pricing:
- When operating, you pay a flat hourly rate based on provisioned capacity, or an on-demand rate when using that capacity mode.
- The price for read capacity units and write capacity units is set regionally.
- You can use auto scaling to help ensure you’re always using the minimum capacity.
- If you can use eventually consistent reads, do so because the same level of capacity supports more eventually consistent than strongly consistent reads.
- Consider regional data transfer costs if you have deployed DynamoDB tables in multiple regions.
- Be sure to carefully manage interregional data transfer charges.
- Use Amazon CloudWatch, AWS Cost Explorer, and AWS Budgets to manage capacity and spending.
- As a DynamoDB customer, you can purchase reserved capacity in advance. If you can predict your need for DynamoDB read and write throughput, reserved capacity offers significant savings over the normal price of DynamoDB provisioned throughput capacity. (Note that reserved capacity does not apply to on-demand capacity mode because you are just paying for the requests you make and not provisioning capacity.)
As with most AWS optimization activity, it is important to monitor and iterate on your use of DynamoDB to be sure to pay only for the performance and storage resources you need.
In this post, I explained how to evaluate DynamoDB as a potential database service for your application, and how to plan for your migration to DynamoDB. If you have questions or comments about this post, submit them in the comments section below.
About the author
Lex Crosett is an AWS enterprise solutions architect based in Boston.