Choose the best Amazon EBS volume type for your self-managed database deployment
There are many reasons to choose AWS managed services for running a database in the cloud. However, there are times when a managed service may not be the right choice, and a self-managed database instance is best for your organization. Ultimately, when your organization makes that decision, selecting the right components for your self-managed database (DB) is an important step to ensuring that it not only performs to your expectations, but also is cost optimized.
Amazon Elastic Block Store (EBS) has a choice of volume offerings, but selecting the right Amazon EBS volume type for your database does not need to be difficult. In this blog, I discuss several considerations and rationales for selecting the most appropriate Amazon EBS volume type for running self-managed workloads on AWS.
Understanding Amazon EBS Volumes
For workloads on AWS, there are two types of Amazon EBS volumes that are recommended:
- Solid state drives (SSD): Optimized for transactional workloads involving frequent read/write operations with small input/output (I/O) size, where the dominant performance attribute is IO’s per second (IOPS).
- Hard disk drives (HDD): Optimized for large streaming workloads where the dominant performance attribute is throughput.
For the purposes of this blog, we focus on the SSD volume type because they are more suited for database workloads. Below is a quick chart outlining some of the key capabilities the SSD volume types offer. More detail can be found by going to the EBS documentation page.
EBS General Purpose SSD and Provisioned IOPS SSD Specifications
At first glance of the preceding table, it might seem obvious that io2 or io2 Block Express is the best choice for running your database. The initial assumption is that all databases need a high IOPS and potentially high throughput volume type. Most would immediately select either io2 or io2 Block Express for any IOPS or throughput intensive workload. Before we make a decision, it’s important to understand some of the key attributes of each of the volume types.
The gp2 volume type is the most commonly deployed for a broad range of use cases such as system volumes, development and test environments, and other low-latency applications. gp2 is a volume type that ties its performance to its volume size and offers a baseline performance of 3 IOPS per GiB of provisioned volume size. For volumes under 1,000 GiB, GPS offers the ability to burst up to 3,000 IOPS based on a burst credit balance described here. Throughput performance is calculated based on volume size up to the throughput limit of 250 MiB/s and is described here.
The gp3 volume type was released on December 1, 2020 to provide our customers with a volume type suited exactly to their needs. These volumes provide a baseline performance of 3000 IOPS, and 125 MiB/s independent of volume size. With gp3 volumes, AWS allows our customers to modify both the IOPS and the throughput capabilities of each volume to help match a given workload requirement. Specifics on proper ratios of volume size to IOPS to throughput can be found here.
Next, we’ll talk about the provisioned IOPS SSD family of volume types. Generally speaking, io1, io2, and io2 Block Express were created to provide our customers with a volume type that was designed for very IO intense workloads, with a single volume being able to scale beyond 16,000 IOPS. While gp2 and gp3 volumes are designed to deliver at least 90% of their performance 99% of the year, io1, io2, and io2 Block Express volumes are designed to deliver 90% of their performance 99.9% of the year.
A key difference between io1 and io2 is that latter (and io2 Block Express) have 99.999% durability. This translates to roughly 0.001% annual failure rate, as compared to a 0.1%–0.2% failure rate of gp2, gp3, or io1 volume types. We’ll get into why this can be an important consideration a little later in this post.
For io2 volumes, depending on the Region and what instance type you attach to will determine whether an io2 or io2 Block Express volume will be deployed. Any new or existing io2 volumes attached to an Amazon EC2 X2idn, X2iedn, R5b, and C7g instances automatically run on io2 Block Express and io2 Block Express volumes exist in all regions where these instances are available. io2 Block Express supports standard io2 features such as Multi-Attach and Elastic Volumes, but a key difference between io2 and io2 Block Express is that the latter has the ability provision a larger volume size and a more IOPS-dense volume. Specifics on io1, io2, and io2 Block Express can be found on our documentation page here.
Considerations when selecting the appropriate Amazon EBS volume type
Now that we have an understanding of the capabilities of the different SSD volume types typically under consideration when running a self-managed database, it’s important to understand how the workload requirements and capabilities of your database play an incredibly important part of volume selection.
As mentioned earlier, just because at first glance an io1 or io2 volume has a higher IOPS capability per volume, it doesn’t mean that we can’t achieve a similar IOPS capability using a different and potentially more cost-effective volume type.
For example, let’s say you had the following set of requirements for your database application:
- 8000 IOPS
- Millisecond-latency requirements
- 125-MiB/s throughput
- Minimum free space of 150 GB
There are a few volume types that will be able to handle this database in terms of performance. Let’s take a look at each one and discuss the pros and cons:
gp2: To get 8000 IOPS from a gp2 volume we would need to provision at least a 2.67-TiB volume (8000 / 3 IOPS per GB = 2.67 TiB) at a cost of $266.70 per month (2667GB * $0.10).
gp3: 150GB volume provisioned at 8000 IOPS and 125-MiB/s throughput at a cost of $37.00 per month (150 GB * $0.08) + (5000 billable IOPS * $0.005). The First 3000 IOPS are provided at no cost as part of the baseline for a gp3 volume.
io1: To get 8000 IOPS from an io1 volume we need to maintain a 50:1 IOPS to Storage ratio. Therefore a 160-GB volume will be needed at a cost of $540.00 per month (160GB * $0.125) + (8000 IOPS * $0.065).
io2: 150-GB volume provisioned at 8000 IOPS at a cost of $538.75 per month (150 GB *$0.125) + (8000 IOPS * $0.065).
In the example, two volumes that stand out as potential winners: gp3 and io2. gp3 stands out because it can be a very cost-effective volume for this workload. io2 stands out due to its high durability capability offering and higher performance guarantees. gp2 is not the best choice here because to achieve the desired IOPS, you must over provision the size of the volume to achieve the desired performance. Also, io2 is a better choice than io1 due not having to over provision the volume, as well as giving you additional durability (99.999%) at a lower price point.
Before we dive into some other considerations, let’s take a more extreme example where your workload may need hundreds of thousands of IOPS across a cluster of systems.
In this scenario let’s say you were presented with the following requirements:
- 4 node Oracle Database 19c with Oracle ASM
- 320,000 IOPS
- 2,800-Mib/s throughput
- 48-TB Available storage
- Millisecond response times
With this set of requirements, there are a number of options we have to deploy this solution. Oracle Automatic Storage Management (ASM) does a great job at distributing the database files across a number of volumes, so we’ll try to distribute the data as evenly as possible across each of the four nodes. We follow Oracle’s best practice recommendations and configure a minimum of four volumes per node.
For our tests, we ran a 70:30 read/write ratio test across our four Oracle server nodes for both io2 Block Express and gp3 volume types. Let’s look to see what the configuration looks like with io2 and gp3 volume types.
For io2, we provision four volumes per node, provisioned at 3 TB. Each volume is provisioned at 22,250 IOPS. This gives the cluster a total of 356,000 IOPS that provides us with enough IOPS at the 90% IOPS capacity 99.9% of the time.
For gp3, we also have the flexibility to provision IOPS up to 16,000 IOPS per volume. In this example, in order to evenly distribute the 48-TB storage amongst the four clustered nodes, we use six volumes per node provisioned at 2 TB each. To get to 356,000 IOPS, we provision each volume with 14,834 IOPS giving us a total of 356,016 IOPS.
For both scenarios, in reference to the previously mentioned EBS-optimized instances, we deploy a r5b.12xlarge that gives us an IOPS maximum of 130,000 per instance.
Now let’s look at the price breakdown below for each of these options using the AWS pricing calculator (disclaimer: all pricing is as of this writing and using us-east-2 public pricing, your pricing may vary):
As you can see from the above table, there is a significant price difference between the two solutions. Which one is right for your workload? It will be hard to tell based on the simple requirements provided at the start of this exercise, so here’s where the additional considerations come into play.
Additional considerations for choosing an Amazon EBS volume type
As you begin to evaluate which volume-type deployment makes sense to you, you should start considering other factors into your decision making. A few considerations that may factor into your decision-making process are provided in the following lists; however, this is by no means an exhaustive list:
- DB Replication: Will your database use built in mechanisms to replicate your data to another Availability Zone (AZ) or another Region?
- Performance: Do you need the extra performance of io2, or io2 Block Express? Or, will the performance of gp3 be sufficient?
- Database instance type: Do you need a bursting instance type or one with dedicated bandwidth?
Let’s break down the database replication consideration. If your database requires you to deploy a second (duplicate) set of infrastructure in another availability zone for a high availability deployment, then would it make sense to pay for a duplicate set of io2 volumes in each Availability Zone (AZ)? Remember, io2 provides a 100x more durability than a gp3 volume (99.8–99.9% as compared to 99.999% durability); however, since you are making the database highly available in another AZ, does paying extra for the io2 volumes in one or both AZ make sense to your business? In most cases, having a replicated copy of the database in another AZ not only ensures greater availability of your database, but also reduces your chances of data loss due to correlated failures.
Performance certainly plays a very important role in selecting the most appropriate volume type for your workload. Understanding the characteristics of your workload requirements is critical to ensure your applications function efficiently for your business. According to AWS documentation latency differences between io2 (sub-millisecond) versus gp3 (single digit millisecond) may make a difference in your selection. While you may find that your gp3 volumes can deliver similar IOPS performance to io2 Block Express (as seen in our example above), the difference is that io2 and io2 Block Express deliver consistent sub-millisecond response times which can be key for mission critical workloads.
In our testing, we used a synthetic load generator for Oracle called Silly Little Oracle Benchmark (SLOB), which is a popular tool used to simulate workloads on Oracle databases. The results for this test were surprisingly similar with both volume types achieving sub-millisecond response times for the duration of the test, as well as achieving the desired IOPS and throughput requirements with ease.
A synthetic load generator is only a simulation of a workload, and every workload has its own specific characteristics that make it difficult to replicate with a load generator. Therefore, the results achieved are only indicative of the specific tests that were run. But here is where the power of AWS comes to bear—if your workload performance requirements change, you can change your volume type whenever you need with zero downtime for your application. In circumstances when you decide to deploy gp3 volumes, but later determine that performance characteristics are not meeting your needs, you can easily modify your volume and convert it to an io2, or vice versa. This would be difficult and expensive to achieve in an on-premises implementation.
EC2 instance type
It’s important to understand that Amazon EC2 instance types can also play a critical role on the maximum performance capabilities of your attached Amazon EBS volumes. It is recommended to deploy Amazon EBS-optimized instances when deploying Amazon EBS volumes for production workloads. The charts provided on the Amazon EBS-optimized instance page give an indication of capabilities of Amazon EBS Bandwidth, MB/s throughput, and IOPS capabilities of your instances. Smaller instance types, which are indicated with an asterisk on the Amazon EBS-Optimized instances paged linked above, have burstable performance. Those instances should generally be avoided for production use, unless you understand your workload completely.
In this blog, we discussed how to choose the best EBS storage volume for your self-managed database. We reviewed EBS volume types, and what you should consider when selecting the appropriate EBS volume. In addition, we review specific performance and cost attributes for EBS gp2, gp3, and io1, io2 and io2 Block Express volumes. Finally we reviewed examples of performance benchmarks and how the EC2 instance type plays a critical role in EBS volume performance.
In circumstances when database workloads performance is a priority over cost, then only the fastest and most durable volume types can fulfill those requirements. But that doesn’t mean that those are the only volume types that should be considered. Because AWS gives you so many choices, selecting the right infrastructure components plays a critical role in ensuring that you can not only strike a balance between price and performance, but also adhere to your business requirements and objectives. It’s important to factor in other requirements to lower your risk of selecting a certain volume type.
After your workload has been running, check out the AWS Compute Optimizer to identify potential areas where you may be either under or over provisioned. This is an important step to ensure you’re using the most optimal volume and compute instance types for your workload.