The world of traditional on-premises IT has become reasonably accurate at forecasting operating costs. We’ve been running IT in a very similar, familiar way for a long time, and we’ve become good at it. When adopting cloud, it can be helpful to reevaluate how we think about cost management, particularly around fixed-cost models and consumption-based services. Below are some ideas that have helped me be successful in this area.
When deploying a new application on-premises, we tend to reuse time-tested infrastructure designs comprised of few general-purpose components, such as physical servers and hypervisors for compute, SAN and NAS devices for storage, and relational databases for data. The architectures we use are well-understood and fall into well-established patterns. As a result, the costs tend to fall into a somewhat narrow range.
By contrast, at AWS we have access to more than 125 services (not counting the 3,000+ products in the AWS Marketplace), and have the ability to be more creative and use more specific, purpose-built components when designing solutions. Because there are many different ways to solve any particular problem, there ends up being greater variety of implementations.
This makes the cost range for any particular workload in AWS wider than on-premises. This wider cost range allows for infrastructure refactoring in AWS to have an impact on overall costs beyond what is possible on-premises. And thanks to pay-as-you-go pricing, AWS makes it possible to change out components in your architecture as often as you like, making infrastructure refactoring not only viable but repeatable.
Sources of Cost Increases
AWS costs get lower every year. (AWS has reduced prices 67 times since launching in 2006). However, costs increase when consumption increases. Let’s take a look at some common sources of consumption increases so we can understand how to counteract them.
First, there is a natural asymmetrical relationship between the provisioning and decommissioning of resources. We always know when we need something, but we don’t always remember when we no longer need it. Because of this, unused resources tend to accumulate. This principle applies at the application portfolio level, in the form of an inflated application count, as well as at the infrastructure level, in the form of unused servers, storage, and other components. We’re not used to actively decommissioning resources because in the fix-cost model the payoff isn’t as significant, but in the consumption-based model actively decommissioning resources is vital.
Second, performance tuning is typically only performed in response to performance problems. When too few resources are allocated, we feel it in the form of poor application performance, so we add resources, such as upsizing EC2 instances and adding Provisioned IOPS to EBS storage volumes. But when too many resources are allocated it is essentially invisible—perfectly allocated resources and over-allocated resources feel identical in the context of application performance. Performance and cost are directly correlated in a consumption-based model, so we should be looking for opportunities to reduce performance where possible in order to reduce cost. In the fixed-cost model investments in performance are a sunk cost, but in the consumption-based model the costs are recoverable.
Continuous Optimization Tenets
Continuous optimization is an iterative process where we implement a set of simple, high-impact cost reduction methods across all applications, and then measure and report the cost savings results. The process is then repeated on a regular cadence. Below are two essential tenets of continuous optimization.
Tenet 1: Cost optimization is not a project, it’s a way of life.
We are never finished with continuous optimization. It is integrated into our existing operating procedures and we work to improve the process every cycle. The process is designed to be low-cost and low-overhead. And within those limitations, continuous optimization is designed to find out exactly what level of super-optimization is possible. “How inexpensively can we run each application?” is the question to be answered.
Tenet 2: Focus on big impact/low effort.
Each optimization idea should be ranked by its impact/effort ratio, and ideas should be implemented starting from the top of the list and progress downward until reaching a point where the effort exceeds the impact. This line will be drawn in a different place by different organizations, and can change over time to suit the business priorities. I give some examples of the ideas I’ve implemented below as a starting point.
The Continuous Optimization Process
Here are three categories of optimization along with several examples of each.
Category 1: Remove
These are the easiest ideas that produce the most cost savings.
- Remove unused applications. Determine whether the application is really needed. If not, delete all infrastructure and data associated with it.
- Remove unused instances. Look for instances that are no longer used, and then shut them down. AWS CloudWatch metrics can be a useful starting point to discover idle instances.
- Remove unused storage volumes. Volumes unattached from instances (orphan volumes) are almost never needed any longer. A helpful policy is to require that needed orphan volumes have a tag specifying who needs it and why it is needed. Verify that they’re not needed and remove them.
- Remove unused snapshots. Storage and instance snapshots accumulate when there isn’t an active process to remove them. Determine what is needed and remove the rest.
- Reallocate or sell unutilized reserved instances. AWS Cost Explorer is a great tool for finding unused RIs. Either move an on-demand instance to an instance type that is covered by an RI or resell them on the Amazon EC2 Reserved Instance Marketplace.
Category 2: Resize
Everything that can’t be deleted should be evaluated to ensure it isn’t over provisioned.
- Resize instances. Use Amazon CloudWatch metrics to determine which instances can be downsized.
- Resize storage volumes. Look at storage volume utilization and reduce any unnecessary free space. Re-evaluate any overgenerous free space policies that were carried over from on-premises. It’s possible to be far more efficient with storage in AWS.
- Reduce performance of storage volumes. Use Amazon CloudWatch metrics to determine if Provisioned IOPS volumes can be detuned or moved to less expensive non-PIOPS volumes.
Category 3: Refactor
This category should be done less frequently, as takes more effort and is less likely to produce results after the first pass. However, the first pass will likely produce significant results, so this step should be done at least once. After that a quarterly or annual review is a reasonable cadence. Look at each application and ensure that the architecture is as efficient as possible. If needed, ask your AWS account team to perform an AWS Well Architected Review.
As you can see, managing costs in AWS can be a simple and productive process. I encourage you to experiment with these ideas and invent new ones. I hope you found this guide to be helpful. Please feel to reach out to me if you have any questions.