Cost Control Blog Series #3: How to Handle Cost Shock
Voiced by Amazon Polly
One of the biggest benefits of the cloud is the ability to scale dynamically and reduce costs by paying for only what you use, when you need it. However, when development teams aren’t prepared for this newfound flexibility, cost shock and unexpected cost spikes can occur. If you are new to the topic, feel free to read the first two posts of our cost control blog series. Cost shock is the realization that you have spent considerably more on the cloud than you had initially planned for. Cost shock primarily impacts individuals in finance roles, responsible for setting and projecting budgets for cloud spend, and developer roles, responsible for deploying and maintaining cloud resources and tooling. In this blog post we will cover how to identify, remediate and prevent cost shock.
Identifying Cost Shock
In order to identify cost shock, developers and finance professionals first need to have a way to view their organizations cost in aggregate, as well as by specific dimensions important to the business, such as account, business unit or cost center. The below native AWS services can help you to identify cost shock.
AWS Cost Explorer
For resource owners, product owners, or financial professionals responsible for budgeting cloud spend, AWS Cost Explorer provides an easy-to-use interface that lets you visualize, understand, and manage your AWS costs and usage over time.
AWS Cost Anomaly Detection
AWS Cost Anomaly Detection uses advanced Machine Learning technology to detect anomalies in your spend trends, and can be configured to send you an alert when it identifies a spend anomaly taking place. With AWS Cost Anomaly Detection, you can identify the root causes of your anomalous spend, and act quickly.
With AWS Budgets you can set a budgeted amount, either for total spend or specific to a dimension of spend (like service or account), for a daily/monthly/quarterly budget, and then configure AWS Budgets to alert you as your actual or forecasted spend approaches what you had budgeted. This can help you stay on track with how much you intended to spend at AWS.
You can find more detail on these native services by navigating to the Services tab on the “Cloud Financial Management with AWS” page. In addition to the above managed offerings, AWS also provides the below solutions to help you meet your cost visibility requirements:
- AWS Cost and Usage Report (CUR)
- Cost Solutions Dashboards: Cost Intelligence Dashboard, CUDOS Dashboard
- AWS Cost Partners
These will give you and your stakeholders more granularity into your costs.
Remediating Cost Shock
Once you’ve identified your cost shock, you will want to take steps to remediate it. In order to effectively remediate causes of cost shock, focus on the below activities.
Bring in the right people
A good question to ask your team is, “Is the resource worth the cost or do we need to make it more cost effective?”. To make this decision, and stop the source of your cost shock, you need to have the right stakeholders involved. The finance or DevOps individual(s) who identified the cost shock need to work with the owner(s) of the resource(s) causing the shock. In some cases, it may be as simple as notifying the resource owner(s) that they’ve spun up a resource that is causing a dramatic change in cost, and they terminate it. In other cases, the resource was created and required for a specific use case, and the resource can’t simply be terminated. When the cause of your cost-shock can’t quickly be turned-off, because it supports a critical use case, think about ways you can meet the requirements of the use case while optimizing for cost, such as rightsizing, scheduling or re-architecting.
Confirm the true source of cost shock
What if resource owners can’t understand why their particular resource or service is suddenly driving up cost? Encourage resource owners to think holistically about what their resource or AWS service is being used for. Often times, AWS resources produce data that is consumed by other applications or 3rd-party services, changes in the consumption patterns of those applications and services can drive a change in cost.
To find costs we can follow the breadcrumbs. Using Cost Explorer, as shown above, we can narrow down the source of the costs to a service. If you need to go deeper, your CUR is a great tool to be utilized. Using the fields product_product_name to see your product, line_item_usage_type to look at what type of service incurred the cost and line_item_resource_id to narrow it down to the exact item, you can then find the resource in your account responsible for the change in cost, and make the necessary changes. To see more about how you can use your CUR to find costs checkout the Well Architect CUR Queries Library.
Preventing Cost Shock
Now that we have found what caused the cost shock, we want to avoid this happening again as much as possible. Following budget alerts, AWS has released AWS Budget Actions, which depending on the budget threshold you set, allow you to stop new resource creation in an account and therefore, stop costs growing. Creating actions should be discussion with developers to confirm that this won’t affect live sites. Ensure these conversations occur early on, to make sure the actions work for all parties involved.
Often, we see customers get cost shock from something that comes out of left field and not from their normal infrastructure. Someone could accidentally deploy to the wrong region and forget about it; people may want to test out a cool new AWS Service but not plan for the costs. Setting up governance around unintentional usage can help avoid this. Service control policies (SCPs) are AWS Organization policies. You can define guardrails, or sets limits, on the actions that the account’s administrator can delegate to the IAM users and roles in the affected accounts. This way you can ban the usage or require approval to use resources that aren’t commonly used by your team, or not planned for in your budget by default.
Some examples are:
- Rare EC2 types such as ix, 5a, f*
- Which Regions are you not going to deploy services in? Stop access to these regions so resources won’t be built and forgotten about
- Unique services, such as Amazon Rekognition or Amazon Sagemaker, that have higher costs may need approval before being used
Example SCPs can be found here that can be deployed through you AWS Organizations.
This is a good place to start. What is key with this is you establish a request & approval process. Developers need to have a way to request access to try the denied services. For example, when a project must be developed in a region that’s not the norm. They will be denied at first from the SCP, but there must be a system in place in which that can be reviewed. With all these elements we don’t want to deny people the ability to develop, we just want to make sure we avoid anomalies. Document your policies and make your developers aware of where they can be found, as it’s not always clear to the developer why they are being restricted. This is the ‘Restrict and Review’ approach which can be viewed as setting guardrails. Moving forward, think about implementing an ‘Educate and Allow’ approach, where develops are educated on how to make cost aware decisions.
Infrastructure Pricing Estimation
Sometimes, cost shock is due to costs not being estimated accurately. Make sure you know the full situation before building in the cloud by using tools, such as the AWS Pricing Calculator, which let you create an accurate estimate. As you start to build out your infrastructure, it is good practice to check that what you’ve built matches your pricing estimate. Use AWS CloudFormation to build your infrastructure, which can provide an estimate of how much it will cost to deploy. This is an opportunity to check-in and make sure your costs are on track as things change and new requirements occur when developing in the cloud. These first two tools are key for your big-ticket items like Amazon RDS or Amazon EC2. It is easy to forget the little things that add up in cloud costs. This guide on how to avoid unexpected charges gives you a list of things to keep an eye out for. We would recommend reviewing this with all stakeholders to keep the communication open about costs. Things like AWS Data Transfer, AWS Config or Amazon CloudWatch may add to cost shock as early budgets hadn’t taken them into consideration at the start. The links attached are ways in which you can review and save costs if they have come up as high spend areas. It is important that both the finance team and developer communicate when setting budgets so they both are on the same page.
One final element to preventing cost shock is driving a cost aware culture in your organization. We often see cost shock as customers having charges that they weren’t expecting. But if those developing in the cloud can see how much they are spending and have an understanding of what is going to happen when they hit that deploy button this can help to avoid it. Using solutions we have mentioned above to empower others to see costs by themselves and take ownership of them will have a huge impact on your spend. One of the best ways to improve this is to have open communications with all stakeholders so you can share information that they need to do their job. This includes items like granularity frequency and format of reports. For example, developers would likely be more interested in per resource or per account granularity of their costs. But managers may care more about project development. Make sure you ask these questions so all can help to prevent cost shock.