Strategy for Efficient Cloud Cost Management
One of the most common questions I get from customers is how to effectively manage their cloud costs. Amazon Web Services (AWS) offers many programs to help customers evaluate and optimize their spending. The Cloud Economics team is fantastic at providing a detailed, holistic analysis of your environment and can identify areas of optimization and improvement. If you have not taken advantage of them, I strongly suggest you do.
But what do you do the day after they leave? What do you do day to day? In my experience as a CIO, I’ve found that effective management of cloud costs is more about the mechanisms that you use to manage these costs on a regular basis rather than any one-time event.
In this post, I’ll share with you six ways to build capability and processes to help manage your cloud costs.
It is somewhat obvious, and you probably have heard it before, you can’t govern what you can’t measure. You might be surprised how many organizations don’t have an effective process for creating transparency and understanding their costs. I occasionally encounter customers who are worried about their cloud costs, only to find most of the time these customers are taking a very reactionary posture to the cloud costs. They are often only looking at their expenses when they get their bill at the close of every month. This is akin to driving your car by looking out the rearview mirror.
In order to create transparency, you have to invest in the tools and in the capabilities to process the data, visualize, and then derive insights from all this data. You need to be able to proactively dive deep into the data and go beyond the surface analysis that the monthly bills provide.
There are numerous tools out there that focus specifically on this area, including some from AWS and some from our partners. Here are just a few:
- AWS Cost Explorer: detailed tool to look back at your AWS cost over time
- AWS Budgets: great way to set budgets and be alerted when usage is out of bounds
- CloudHealth: comprehensive solution for budgeting, tracking, and reviewing your AWS costs
- Apptio: great platform to provide financial accountability and transparency for your AWS costs
When creating transparency regarding your cloud costs, the old saying “garbage in, garbage out” comes to mind. More often than not, it is a case of missing data in addition to bad data. To address this, insist on tagging all assets in your environment. When I was CIO, I went so far as to issue an edict where every night at midnight, all untagged assets are deleted. Clearly, you should give your team a heads up before doing this. But without a robust tagging strategy that clearly describes the assets and makes ownership and accountability clear, you won’t be able to accurately record and interpret your cloud costs.
Make Engineering Teams Accountable
I suggest using the two-pizza team model and encouraging your product teams to own the financial responsibility for their products. It is wrong to think you will optimize from the center in a well-intentioned but misinformed centralized cloud costs, optimization team. Cost. Engineers who take ownership of their products will think about costs when designing solutions. And both those with the most knowledge of the system and those with the most to lose if there is a problem will make informed decisions about how to optimize the environment. Cost awareness must become part of the engineering culture, and it starts with developing an understanding of billing data. That’s where the transparency and cloud cost management tools come in.
But I have found that assigning responsibility and having tools is often not enough. To overcome this, I have utilized this sales model: How do you make the year? You make the quarter. How do you make the quarter? You make the month…and so on. In support of this, I established a process whereby using the cloud cost management tools, the product owner for each product would receive an email every morning that told them what they spent for their products yesterday versus what was budgeted. This would let that product owner know if they should make any adjustments during the remainder of the week to deal with cost overruns. It also empowered the teams to try new solutions out and gave them the transparency and accountability to deal with any cost overruns. Building on this model, the product owners would meet with the directors of development on a weekly basis and ensure that they were within budget or had remediation plans to address. The directors likewise would meet with the VPs on a biweekly basis to do the same. This meant by the time we got our monthly bill, we pretty much knew exactly how much it was going to be and had taken the necessary steps throughout the month to ensure we were on budget.
Train Teams to Think about Cost
This might sound like a lot to put on these product owners and directors. And to an extent, it is. But we’re in a brave new world and it calls for building new skills. AWS has multiple programs that can help your teams have a better sense of understanding cloud costs and how to management through our “Introduction to AWS Billing and Cost Management” and “Well-Architected Labs Cost Optimization” videos. I strongly encourage you to have as many members of your team as possible attend these, including your finance teams.
Plan, Forecast, Budget, Learn
One of the challenges with a cloud financial model is the potential for variability. Now remember, we’re managing costs on a daily basis, so costs should only be variable by design and not surprise us. We did sign up for being agile, testing out new innovations, and scaling up successful solutions, right?
But you need to get comfortable with continually right sizing your cloud budget. Make sure you have a good partnership with Finance, and use the transparency and detailed controls you have in place to work with them to adjust your budget as your needs change. Likewise, help them appreciate the cost savings and effectiveness due to the teams managing the costs daily.
You will get better at forecasting growth as you learn more about how to model your workloads. But I stress that it is important to create forecasts even though early on you expect a wide margin of error. This is because they’re fundamental to setting macro-level expectations of your cloud spending and a foundation to learn from. Without such forecasts, you won’t be able to understand whether you’re spending less or more than you expected, and you won’t be able to improve your forecasting ability.
I am not going to cover all the possible metrics you should be gathering and using, nor am I going to focus on any operational metrics. Rather, I want to focus on a few simple metrics I used that helped me understand my company’s financial health and manage control over my cloud costs.
- Budget vs. actual: Budgeted spend for a period vs. actual spend. This helps establish accountability, appreciation, and ability to investigate spend in order to manage budget and perform cost optimization.
- Average daily spend: Actual daily spend. I knew how much my organization should be spending every day (it changed throughout the year as our volumes increased and the business grew), but I could look at the daily figure and quickly judge if we were on track or not. Not only did I know it, but I expected my leaders and product owners to know it.
- Provisioned capacity vs. utilized capacity: If provisioned capacity is considerably higher than utilization, it’s a sign the environment could be inefficient and optimizations should be made. Again, the Cloud Economics team can help with a detailed analysis on this.
- Spend on unused resources: Every organization has unused resources (EBS volumes seem to be one of the biggest culprits here). Sometimes they’re necessary as you need snapshots for various things, but in general, you should spin down resources you don’t need. How much is being spent on them is a good indicator of cloud fiscal management.
- Reserved instance vs. on-demand ratio: Unless you have a very static and predictable business, you are unlikely to have all reserved instances. You are always going to have a fleet of on-demand instances to deal with elastic spikes and drops in your environment. What is the right ratio depends on several many things, the volatility of your business, risk tolerance, and financial objectives, to name a few. But what I found was the right ratio tended not to change much over time (if at all). Therefore, when it did change, it might be a sign that I should convert some of the on-demand instances into reserved instances, as my business had grown and the floor that I was operating on had risen.
- Cost of untagged resources: This in theory should be zero, as we are deleting them every night. But if you don’t do that, at least understand how much you are spending. It is likely that no one is managing these resources so they’re just burning money away.
Last, but Not Least: Optimize
There are many good optimization recommendations. Instead of listing them all here, I would like to direct you to a blog post from the Cloud Economics team on cost optimization best practices.
I think you will find managing your cloud costs can actually be relatively straightforward. If you implement some of these recommendations, you’ll surely be able to achieve your objectives.