Cost tracking for OpenShift on AWS

AWS provides a collection of tools and services to give customers the ability to manage the resources within their AWS accounts. In this article I will briefly explore some of these tools and services, as well as an open source project that can be used to integrate AWS cost-management services and features directly into Red Hat OpenShift.

Overview

Agility, innovation, operational leanness, and cost reductions are some of the leading drivers for business modernization. These are not simply industry buzz words; customers I regularly speak with are exploring ways to enable teams to be more self sufficient, and exploring how developers can be enabled to provision and manage the services they need without depending on other teams. Being able to do more with less—and to deliver more without adding cost—has become increasingly important.

Customers migrating OpenShift to AWS consistently send me a few questions:

I am new to AWS. How to a compare my costs on AWS to cost on premises?
How do I track AWS resources that underpin OpenShift?
We want to enable our teams to adopt native AWS services. How do we track the costs of teams and projects?
Our OpenShift implementation takes advantage of scaling on AWS, which means we scale beyond our subscriptions and need to perform a true-up later. How do we track consumption and report this for billing?

The first step to any optimization is visibility, because decisions can’t be made if you do not have insight into how much is being spent and on what.

AWS Trusted Advisor

The AWS Trusted Advisor is a tool that allows customer to gain insight into possible improvements within an account. These areas for review are displayed in pillars of cost, performance, security, reliability, and AWS service limits.

AWS Trusted Advisor Dashboard

Looking at the Trusted Advisor Dashboard above, Cost Optimization shows that there might be ways to lower costs. Is the problem simple waste and resources should be removed? Has something been over-provisioned, and should we take advantage of the agility of running on AWS to scale down? Is a migration of workloads about to take place, and ensuring the migrations complete as soon as possible is critical so as to grow into the over-provisioned resources?

Moving on, let’s look more deeply into Cost Optimization.

Right away there appear to be Amazon EC2 compute resources that can be optimized. These may be underutilized resources that, if they are not being used at all, should be reviewed and potentially terminated. Can they be auto-scaled, stopped, and started again when needed?

For customers running OpenShift on AWS, the master and infrastructure nodes should be considered. If these are new clusters deployed as part of a migration, I would take steps to ensure that application workload migrations are on track. If this is an existing implementation and workloads are already running, I would look at the instance sizing and utilization. Are the instance sizes too large for the workloads? If so, perhaps I should scale these down. For pervasive workloads such as OpenShift Master nodes, am I taking advantage of AWS reserved instances and saving plans?

Are the underutilized Amazon Elastic Block Store (Amazon EBS) volumes attached to EC2 instances? Can these be removed?

The screenshot below shows the benefit of tagging resources. I will come back to tagging and how it can be used for cost tracking, billing, and charge backs later in this post.

Trusted Advisor is just that—an advisor that highlights items that could warrant additional consideration; however, it is not able to provide clear context, use case, and business impact. As such, careful review is needed to discover why the above items are the way they are. For example, are volumes underutilized temporarily as workloads are being migrated? Is there an automation improvement that would help prevent leaving residual resources behind? Is there a risk of data loss if these items are removed? Has right-sizing of resources been completed? Is there a need for education or process change within the teams? Trusted Advisor is a good place for teams to get visibility and start asking the right questions.

AWS Cost Explorer

Let’s check out the AWS Cost Explorer. As you can see below, the AWS Cost Explorer has a rich feature set.

For example, you can compare a current bill to a projection of what the costs will be at the end of the month. The Cost Explorer can be helpful during proof of concept (POC ) processes to explore and extrapolate costs as you move into migration and further adoption phases. The daily and monthly visualizations help provide guidance to costs related to scaling and drive decisions around when to take advantage of constructs, such as AWS Reserved Instances or Amazon EC2 Spot Instances for certain workloads.

The AWS Cost Explorer also has pre-built reports and the ability to create custom reports.

As you can see below, typically there is EC2 consumption when running OpenShift on AWS.

As application workloads are deployed onto the OpenShift clusters on AWS, there may be increased EC2 usage resulting from additional EC2 instances and Amazon EBS storage for the worker nodes. I would expect to see other AWS services, such as databases, storage, caching, queuing, and messaging start to manifest within the account as more AWS services are adopted to complement the application workloads running within OpenShift. The per-service report is a good way to gain insight into this adoption:

You can also use AWS Cost and Usage reports, which provide detailed billing in a CSV format generated and delivered into an Amazon Simple Storage Service (Amazon S3) bucket daily.

Metering

As of OpenShift version 4.4, customers running OpenShift on AWS have the ability to enable metering within the application platform.

Administrators can install and configure metering using the Metering Operator either from the OperatorHub within the OpenShift console or from CLI:

OpenShift Metering Operator

Administrators can install and configure metering either from the operator hub within the OpenShift console or from CLI.

OpenShift view showing installed Operators

The Metering Operator will deploy several pods:

$ oc -n openshift-metering get pods

NAME                                  READY   STATUS    RESTARTS   AGE
hive-metastore-0                      2/2     Running   0          3m28s
hive-server-0                         3/3     Running   0          3m28s
metering-operator-68dd64cfb6-2k7d9    2/2     Running   0          5m17s
presto-coordinator-0                  2/2     Running   0          3m9s
reporting-operator-5588964bf8-x2tkn   2/2     Running   0          2m40s

The metering function is largely a performance monitoring solution. The following command will return the various sources of metering data:

oc get reportdatasources -n openshift-metering | grep -v raw

NAME                                         EARLIEST METRIC        NEWEST METRIC          IMPORT START           IMPORT END             LAST IMPORT TIME       AGE
node-allocatable-cpu-cores                   2019-08-05T18:52:00Z   2019-08-05T16:52:00Z   2019-08-05T18:52:00Z   2019-08-05T18:54:45Z   9m50s
node-allocatable-memory-bytes                2019-08-05T16:51:00Z   2019-08-05T18:51:00Z   2019-08-05T16:51:00Z   2019-08-05T18:51:00Z   2019-08-05T18:54:45Z   9m50s
node-capacity-cpu-cores                      2019-08-05T16:51:00Z   2019-08-05T18:29:00Z   2019-08-05T16:51:00Z   2019-08-05T18:29:00Z   2019-08-05T18:54:39Z   9m50s
node-capacity-memory-bytes                   2019-08-05T16:52:00Z   2019-08-05T18:41:00Z   2019-08-05T16:52:00Z   2019-08-05T18:41:00Z   2019-08-05T18:54:44Z   9m50s
persistentvolumeclaim-capacity-bytes         2019-08-05T16:51:00Z   2019-08-05T18:29:00Z   2019-08-05T16:51:00Z   2019-08-05T18:29:00Z   2019-08-05T18:54:43Z   9m50s
persistentvolumeclaim-phase                  2019-08-05T16:51:00Z   2019-08-05T18:29:00Z   2019-08-05T16:51:00Z   2019-08-05T18:29:00Z   2019-08-05T18:54:28Z   9m50s
persistentvolumeclaim-request-bytes          2019-08-05T16:52:00Z   2019-08-05T18:30:00Z   2019-08-05T16:52:00Z   2019-08-05T18:30:00Z   2019-08-05T18:54:34Z   9m50s
persistentvolumeclaim-usage-bytes            2019-08-05T16:52:00Z   2019-08-05T18:30:00Z   2019-08-05T16:52:00Z   2019-08-05T18:30:00Z   2019-08-05T18:54:36Z   9m49s
pod-limit-cpu-cores                          2019-08-05T16:52:00Z   2019-08-05T18:30:00Z   2019-08-05T16:52:00Z   2019-08-05T18:30:00Z   2019-08-05T18:54:26Z   9m49s
pod-limit-memory-bytes                       2019-08-05T16:51:00Z   2019-08-05T18:40:00Z   2019-08-05T16:51:00Z   2019-08-05T18:40:00Z   2019-08-05T18:54:30Z   9m49s
pod-persistentvolumeclaim-request-info       2019-08-05T16:51:00Z   2019-08-05T18:40:00Z   2019-08-05T16:51:00Z   2019-08-05T18:40:00Z   2019-08-05T18:54:37Z   9m49s
pod-request-cpu-cores                        2019-08-05T16:51:00Z   2019-08-05T18:18:00Z   2019-08-05T16:51:00Z   2019-08-05T18:18:00Z   2019-08-05T18:54:24Z   9m49s
pod-request-memory-bytes                     2019-08-05T16:52:00Z   2019-08-05T18:08:00Z   2019-08-05T16:52:00Z   2019-08-05T18:08:00Z   2019-08-05T18:54:32Z   9m49s
pod-usage-cpu-cores                          2019-08-05T16:52:00Z   2019-08-05T17:57:00Z   2019-08-05T16:52:00Z   2019-08-05T17:57:00Z   2019-08-05T18:54:10Z   9m49s
pod-usage-memory-bytes                       2019-08-05T16:52:00Z   2019-08-05T18:08:00Z   2019-08-05T16:52:00Z   2019-08-05T18:08:00Z   2019-08-05T18:54:20Z   9m49s

Ideally customers running OpenShift would want to store the data collected from the metering process on something cost-effective and scalable. As such, metering can be configured to push the data out to Amazon Simple Storage Service (Amazon S3). You have the option to make use of an existing Amazon S3 bucket, or the Operator can create the bucket within the AWS account. An AWS Identity and Access Management (IAM) user with permissions to write to the S3 bucket are needed.

The settings below show that the MeteringConfig does not refer to the actual IAM credentials, but instead points to a secret within OpenShift:

apiVersion: metering.openshift.io/v1
kind: MeteringConfig
metadata:
  name: "operator-metering"
spec:
  storage:
    type: "hive"
    hive:
      type: "s3"
      s3:
        bucket: "bucketname/path/" 
        region: "us-west-1" 
        secretName: "my-aws-secret" 
        # Set to false if you want to provide an existing bucket, instead of
        # having metering create the bucket on your behalf.
        createBucket: true

CLI view of installed operators

You can use the following command to encode the access key and secret key and store these in OpenShift as a secret:

oc create secret -n openshift-metering generic your-aws-secret --from-literal=aws-access-key-id=your-access-key --from-literal=aws-secret-access-key=your-secret-key

So that the Metering Operator can interact with the Amazon S3 bucket, the IAM user will need the following IAM policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "1",
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:GetObject",
"s3:HeadBucket",
"s3:ListBucket",
"s3:ListMultipartUploadParts",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::operator-metering-data/*",
"arn:aws:s3:::operator-metering-data"
]
}
]
}

Note that the Metering Operator is not able to delete or remove the data in the Amazon S3 bucket, or the bucket itself. This is done to protect against data loss. Should the Metering Operator be uninstalled, these resources will remain in AWS and must be removed.

reporting-operator

You also must configure the reporting-operator, which collects data from Prometheus, an open source monitoring system. The reporting-operator can be configured via the Metering Operator. You can either push the data to the Prometheus endpoint, which is provided as part of OpenShift, or to an external Prometheus instance. Refer to the OpenShift documentation for more information about configuring the reporting-operator.

Koku

Koku, which is integrated into OpenShift, is an open source solution for cost management of cloud and hybrid cloud environments. Koku exposes resource consumption and cost data in easily digestible and filterable views. The project also aims to provide insight into this data and ultimately provide suggested optimizations for reducing cost and eliminating unnecessary resource usage.

You can now configure cost management within OpenShift, which allows you to aggregate, visualize, and query data from both the OpenShift Metering Operator, as well as the cost usage reports.

You can create correlations between the performance data collected by OpenShift metering and costs with the cost usage reports. First, metering and cost usage reports must be configured as discussed previously.

The Metering Operator must be updated with the source of the billing information:

apiVersion: metering.openshift.io/v1
kind: MeteringConfig
metadata:
  name: "operator-metering"
spec:
  openshift-reporting:
    spec:
      awsBillingReportDataSource:
        enabled: true
        # Replace these with where your AWS billing reports are
        # stored in S3.
        bucket: "<your-aws-cost-report-bucket>" 
        prefix: "<path/to/report>"
        region: "<your-buckets-region>"

  reporting-operator:
    spec:
      config:
        aws:
          secretName: "<your-aws-secret>" 

  presto:
    spec:
      config:
        aws:
          secretName: "<your-aws-secret>" 

  hive:
    spec:
      config:
        aws:
          secretName: "<your-aws-secret>"

Cost Management

The OpenShift Cost Management Operator takes this a step further. Customers using OpenShift 4.4 and above can access the Cost Management Operator on the Red Hat site. The Cost Management Operator provides a web console with visibility into all clusters across all environments.

Hybrid customers can see visualizations, reports, and queries for clusters and related AWS resources. You also can inspect costs per region, per AWS account, and per OpenShift cluster.

Effective use of AWS tags can facilitate even greater detail, such as related AWS resources and costs back to OpenShift projects. Projects are an extension of Kubernetes namespaces, which in OpenShift cater to separation of workloads, role-based access control, tracking, and even assist with charge-back processes.

OpenShift administrators can install and configure the Cost Management Operator on each cluster. This will upload the data collected by each cluster to cost management in cloud.redhat.com. These clusters can be anywhere within the customer hybrid landscape.

If you log into cloud.redhat.com with your Red Hat account details, you can access the OpenShift Cluster Manager and other tools. I have simply scrolled down in the list to find the Cost Management Console:

From here you can see all the data uploaded from the Cost Management Operator running on the OpenShift clusters in the hybrid environment. You can also add data sources from AWS, such as the cost usage reports. Adding a source will report an IAM role that is able to access account billing and Amazon S3:

The OpenShift documentation explains adding AWS tags for cost management and adding additional AWS sources.

Once the various sources have been added, the cost management tool aggregates and visualizes the data. You are then able to create reports, cost models, and drill down into the data.

Read the Red Hat Getting Started with Cost Management documentation to learn more about cost management features.

Conclusion

Gaining insight and visibility into costs is a critical step toward understanding and optimizing a cost footprint. AWS and Red Hat provide a rich set of products and features you can use to track and explore costs so you can start asking the right questions as part of your cost modernization journey.

The Koku project and components and features of Red Hat OpenShift are open source, and we encourage you to take part in contributing.

AWS Open Source Blog

Cost tracking for OpenShift on AWS

Overview

AWS Trusted Advisor

AWS Cost Explorer

Metering

reporting-operator

Koku

Cost Management

Conclusion

Resources

Follow

Learn

Resources

Developers

Help