Running spiky workloads and optimizing costs by more than 90% using Amazon DynamoDB on-demand capacity mode

This is a guest post by Keisuke Utsumi, a Software Engineer with TVer Technologies Inc. In their own words, “TVer Technologies Inc. offers interactive entertainment services to users using a synchronized website with a TV broadcast.”

TVer Technologies Inc. provides website and app-based interactive content for TV viewers in Japan. Many of our applications use Amazon DynamoDB as a database to store registered user information and logging the history of users’ voting activity in live voting events during TV broadcasts. Our applications are often used by daily morning shows and seasonal pop music shows. In this blog post, we review how we have been able to optimize costs and performance for a system used in a TV live-voting event by using DynamoDB’s on-demand read/write capacity mode.

For most of our live-voting projects, we only see user access for a couple of hours, because the period of customer activity is limited to the TV program’s runtime. In those few hours, spikes of access request only last for a few minutes. The workload during non-peak times is almost nothing compared to peak times; 1:100 or 1:10,000 in ratio.

The following graph shows a record of access requests to our web service during a TV program that drove viewers to vote on a website. When there was no voting, there were no requests, due to the accesses of the voting activities from viewers. Specifically, between 19:30 and 20:15, there were no requests because there was no voting activity from users. Then at 20:15, we saw a spike for a few minutes because viewers starting voting, thus the system started logging users’ data. This pattern of several brief spikes, during voting periods repeated, and occurrence are irregularly spreading until the program was over at 22:30. Because Amazon CloudWatch Logs collected the record, the numbers are averaged per minute. The actual number recorded during peak time was two to three times larger than non-peak time.

Why Amazon DynamoDB on-demand?

In our case, we found Amazon DynamoDB on-demand to be most useful. We could have used DynamoDB auto scaling, but in the case of abrupt or unexpected spikes in requests due to unplanned promotions in a TV program, DynamoDB auto scaling would not be able to catch up fast enough. With DynamoDB on-demand, we are able to save money and reduce manual interventions without any delay.

Some live programs do not have a strict schedule of events during the program. Therefore, it is difficult to predict when traffic will spike beforehand. If we pre-provisioned DynamoDB capacity in preparation for peak traffic, we would have to pay for those resources regardless of when the peak actually occurred. Before DynamoDB on-demand was available, we scaled up capacity manually an hour before the TV program started to accommodate the potential spikes, then scaled down after the TV program finished. This process of preparing for sudden spikes was costly.

DynamoDB on-demand enabled us to pay-per-request, which was optimal for our services that receive short bursts of requests with long periods of low traffic in between. The following graph shows the Request Count metrics from one of our television programs. We received a low number of requests, followed by short peaks of requests caused in relation with the contents of television program in 21:15-22:30.

In production

We implemented DynamoDB on-demand the morning after it launched at re:Invent 2018. The configuration is easy and requires updating BillingMode to PayPerRequest. We configured DynamoDB on-demand at 2:00 p.m. and had a broadcast scheduled for 9:00 p.m. that same day. We completed a load test at 4:00 p.m. without any problems.

Before we switched to DynamoDB on-demand, it took time to scale down the many DynamoDB tables manually after every show. Right after the General Availability Release of the DynamoDB on-demand, we started to update the configuration of 300 DynamoDB tables to PayPerRequest. We no longer have to take the time to scale down manually after broadcasts, and developers can now concentrate on other tasks. Switching between provisioned and on-demand mode requires a simple selection in the DynamoDB table Capacity tab in the AWS Management Console, shown in the following screenshot.

Measuring cost savings

We use DynamoDB in multiple services and set 95% of our tables to on-demand. These tables were previous configured to use provisioned billing. As a result, we saw the following cost-saving improvements in one month:

Platform service – This is the base service of all other services. Monthly cost decreased from $1,981 to $198, a savings of 90%.
Gateway server – This server communicates with the TV program. Monthly cost decreased from $2,587 to $95, a savings of 95%.
Audience requests – This service accepts requests from the audience for gift drawings. Monthly cost decreased from $613 to $0.58, a savings of 99.9%.

Challenges from this point

When you create a new DynamoDB table with on-demand mode, they start with capacity up to 4,000 WCU or 12,000 RCU by default. An on-demand table can instantly accommodate traffic spikes up to 200% of its previous high-water mark, and is capable of doubling that potential every half hour. But sometimes we would need a lot more capacity and didn’t want to wait for the table to scale.

In such cases, we would “pre-warm” a provisioned table with enough WCU and RCU to meet our requirements, and then switch the table to on-demand mode. This works because of DynamoDB’s logic for determining the capacity for on-demand tables that previously used provisioned mode; the previous peak is half the maximum previous write capacity and read capacity units provisioned for the table or the settings for a newly created table with on-demand capacity mode, whichever is higher. For more details, see Initial Throughput for On-Demand Capacity Mode in the DynamoDB Developer Guide.

Note that when you change the configuration to on-demand and change back to Provisioned, you cannot change back to on-demand again for 24 hours. During this 24-hour period, you must adjust provisioned WCU and RCU to accommodate your application traffic (or just turn on auto scaling).

Conclusion

If you also use DynamoDB for a service that receives requests with several peak times and experiences abrupt workload spikes, you too can benefit from switching to on-demand mode. However, if your service receives steady traffic volume without abrupt spikes, you are likely better off using DynamoDB provisioned mode with auto scaling. DynamoDB on-demand mode is also a great cost-saving measure in a development and test environments, because such environments usually receive sporadic request volume. Choosing on-demand mode when the use case warranted it enabled TVer Technologies Inc. to save money and reduce operational load.

About the Author

Keisuke Utsumi is a Software Engineer with TVer Technologies Inc.

AWS Database Blog