Billing and Cost Optimizations Essentials
GETTING STARTED GUIDE
When using the cloud to host your applications and systems, it is important to understand how the billing model works, and how you can optimize your costs. The cloud allows you to trade fixed expenses (such as data centers and physical servers) for variable expenses, and only pay for the resources as you consume them. There are a number of different billing dimensions, depending on the type of resource you use. These can range from the amount of time a resource is running; how much data is stored, transferred, or processed; or the number of API invocations made.
How do I view my costs?
The first step in understanding your billing is to be able to see what resources you have, and what they cost to run. The AWS Billing Dashboard in the AWS Management Console shows the high-level overview of your current monthly costs, along with a forecast based on the currently running resources. If you are not yet familiar with the console, we recommend reading the Getting Started with the AWS Management Console tutorial. Billing information is considered sensitive, and as such, only the root user of any AWS account initially has access to this section of the console. If you are unable to view the billing dashboard, you will need to delegate access to your AWS IAM user. Here is an example of the billing dashboard, showing which services are currently in use, and what they cost:
While the billing dashboard gives you an overview and a high-level breakdown of costs, you may want to look at costs in more detail. This can be done by using the AWS Cost Explorer. It allows you to see which services were used, and the amount each contributed to your monthly spend. There is also a graph showing the different values, and you can use it to filter across a number of different dimensions, such as Region or service. Here is an example of the billing dashboard:
Can I get an alert when my spending projection is above a certain amount?
Yes! This should be one of the first things you set up when creating a new AWS account. Follow the instructions in the Amazon CloudWatch User Guide to set up an alert based on total estimated charges. You can set additional alarms based on a number of metrics or dimensions, depending on your needs.
What resources are free?The AWS Free Tier provides customers the ability to explore and try out AWS services free of charge up to specified limits for each service. The Free Tier comprises three different types of offerings: a 12-month Free Tier, an Always Free offer, and short-term trials. Services with a 12-month Free Tier allow customers to use the service for free up to specified limits for one year from the date the account was created. Services with an Always Free offer allow customers to use the service for free up to specified limits as long as they are an AWS customer. Services with a short-term trial are free to use for a specified period of time or up to a one-time limit, depending on the service selected. To see what Free Tier resources you are currently using, and how much of each you are using, open up the Free Tier dashboard under the Billing section of your AWS account. Here is an example of what you will be able to see:
Can I pay a fixed, predictable amount per month?
Services on AWS are usually billed on a per-consumption rate based on different dimensions, such as the length of time the resource is running, amount of data processed or transferred, and number of requests. Some of the services have a Free Tier, and your monthly costs depend on the combination of services that you use.
If you are looking for a solution with a fixed, predictable cost, Amazon Lightsail is a service that offers easy-to-use virtual private server (VPS) instances, containers, storage, databases, and more at a cost-effective monthly price. As an example, follow this tutorial on deploying a WordPress site on Amazon Lightsail.
How can I reduce my monthly bill?You can reduce your monthly bill in a number of ways. These range from optimizing the instance or database quantities or sizes you are using, migrating from licensed databases to open-source ones, automatically scaling up and down based on demand, or changing your services to use AWS Lambda or other serverless services that scale down to zero when not in use. Another option is to switch off environments and resources that are not in use 24/7. As an example, there are 168 total hours in a week. If the developers only use the development environment during office hours (8 AM-6 PM [10 hours]), seven days a week, turning it off would save you 98 hours per week (~58.33%). To implement this solution, you can look at the Instance Scheduler on AWS to automatically turn off instances and databases on a schedule.The sections below will cover specific scenarios for using AWS Spot Instances; auto scaling to scale up or down based on load; and optimizing network, compute, and database costs.
What are Spot Instances?
When you spin up an EC2 instance, it is called an On-Demand instance, with the associated costs per second of running. You can also spin it up as an EC2 Spot Instance, which lets you take advantage of unused EC2 capacity in the AWS Cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices. The caveat here is that as it relies on unused EC2 capacity, it may be terminated at some point if there is a spike in on-demand instances, with a 2-minute warning of when this will happen. This allows you to complete any requests in-flight on the instance, and gracefully shut it down, or suspend it till there is spot capacity available again. You can also use a combination of different spot instance types to reduce the likelihood not having enough capacity for your requests - see the next section to learn more.
Automatically scale resources based on demand
Scaling resources up and down based on the current workload enables you to have just enough capacity to handle incoming requests. For Amazon EC2 instances, this can be done by using auto scaling and configuring scaling rules based on the metrics like CPU load, network in/out, number of requests, and more. For workloads where you have predictable spikes, such as a system used by a school where most people log in and start using it at 7 AM, you can configure auto scaling with predictive scaling to scale on a schedule between 6:50 AM-6:10 PM, as an example.
As mentioned in the previous section, Spot Instances are another way to reduce the costs for EC2 instances. Using auto scaling, you can create blended fleets of instances that are composed of a combination of on-demand and Spot Instances, with options to choose which Spot Instances to use. As an example, you can configure it to use spot instances for both m5.large and m5.xlarge, with a weighting attached to indicate how much capacity each provides - in this instance, the m5.xlarge has twice as much compute as the m5.large, so assigning it a value of 2, and the m5.large a value of 1 will allow auto scaling to make a decision on which type to choose when scaling. The dimension of which spot instance to choose can be configured based on lowest-price, where it will pick the instance with the lowest per-unit cost as defined with the weights, or capacity-optimized, where it will choose the instance type with the most current spot capacity, reducing the chance of your workload being interrupted.
If you are using Lambda functions for your applications, there is an open-source tool AWS Lambda Power Tuning, hosted on GitHub, that helps test code using different Lambda configurations to find the best cost and performance combination for you.
Optimizing compute costs
A good starting point for cost optimization is to analyze your EC2 instance or Lambda function sizes based on the amount of processing they do. Different EC2 instance types are optimized for different workloads, such as high frequency CPUs, high memory capacity, fast NVME SSD local storage, attached GPUs for machine learning, and high network throughput. As an example, if you see that your workload is CPU intensive, it may be cheaper to use the C5 family of instances instead of the M5 general-purpose ones. Conversely, if you see only periodic spikes in CPU usage, the burstable T family of instances could be better suited to your workload. To get started optimizing your compute, you can use the AWS Compute Optimizer to analyze running workloads to make recommendations.
Optimizing data transfer costs
Transferring data between AWS Regions, Availability Zones (AZs), or between AWS and the internet has an associated cost. You can reduce this cost by designing your infrastructure to route traffic along optimal routes. The first step is to look at VPC endpoints if you are making any calls to AWS services, such as Amazon S3 or Amazon ECR. By creating a VPC endpoint inside your VPC, your calls to the supported AWS services are routed through it, and remain inside the AWS network instead of calling out of the VPC to the internet and then back to the AWS network. This will help you avoid network egress costs.
The next step is to optimize the calls between your infrastructure. When deploying across multiple AZs for resiliency, you can use Availability Zone Affinity to ensure calls are routed as much as possible inside each Availability Zone. If your application uses a database supported by Amazon RDS, you can create read replicas so that all read calls made can be done using the read replica inside the same Availability Zone.
Optimizing database costs
To start optimizing database costs, you can deploy additional read replicas to offload queries from the primary database that only reads data. This will free up capacity on the primary node that handles all the data changes (updates, inserts, and deletes). This will help optimize up to the point where you need to start scaling the database instance up or down as load fluctuates. You can change the instance size of an Amazon RDS database, but that requires the database to go offline while the instance type is changed. If your application needs continuous uptime, it is worth using Amazon Aurora, a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. Aurora is fully compatible with MySQL and PostgreSQL, allowing existing applications and tools to run without requiring modification. Amazon Aurora Serverless is an on-demand, scalable configuration for Aurora where the database automatically starts up, shuts down, and scales capacity up or down based on your application's needs.
How to optimize a fixed workload
When you have optimized your workload as much as you can, and there is a fixed minimum load you need to support, it is worth considering signing up for a Savings Plan. A Savings Plan is a flexible discount model that provides you with the same discounts as Reserved Instances, in exchange for a commitment to use a specific amount (measured in dollars per hour) of compute power over a one- or three-year period. Savings Plans are available in two flavors:
Compute Savings Plans provide the most flexibility and help to reduce your costs by up to 66% (just like Convertible RIs). The plans automatically apply to any EC2 instance regardless of Region, instance family, operating system, or tenancy, including those that are part of Amazon EMR, Amazon ECS, or an Amazon EKS cluster. For example, you can shift from C4 to C5 instances, move a workload from Dublin to London, or migrate from EC2 to AWS Fargate, benefitting from Savings Plan prices along the way, without having to do anything.
EC2 Instance Savings Plans apply to a specific instance family within a Region and provide the largest discount (up to 72%, just like Standard RIs). Just like with RIs, your Savings Plan covers usage of different sizes of the same instance type (such as a c5.4xlarge or c5.large) throughout a Region. You can even switch from Windows to Linux while continuing to benefit, without having to make any changes to your Savings Plan.
Tools and services to help with cost optimization