Cost Optimization Practices on AWS for Telco BSS Workloads
Communication service providers (CSPs) are transforming their IT capabilities for Business Support Systems (BSS) to the promising economics of 5G, IoT, Mobile Virtual Network Operator (MVNO), and Ultra Low Latency Gaming Applications. This transformation means that CSPs plan to achieve operational agility, reduce their time to market and operational cost, increase efficiency, and innovate faster. Every CSP’s cloud adoption journey is unique — some are starting their journey by rehosting (lift-and-shift) and some are re-platforming/modernizing. Regardless of the approach, cloud cost optimization strategy is fundamental to that modernization plan and the realization of monetization opportunities. AWS Well-Architecture Framework’s cost optimization pillar provides prescriptive guidance to run systems to deliver business value. This enables customers to take control of costs and continuously optimize spend, all while building modern, scalable applications to meet customer needs. In this post, we discuss the pillars of cost optimization in the cloud journey through the telecom BSS lens.
Cost optimization pillars
Cost optimization is a continuous process, and CSPs’ Cloud Center of Excellence (CCoE) team must adapt mechanisms to improve costs consistently. Regardless of migration strategy and architecture, five cost optimization pillars apply across nearly all environments. Right sizing appropriate compute instances to workloads’ performance and capacity requirements at the lowest possible cost; increasing elasticity by dynamically meeting the needs and turning off resources when they aren’t needed; leveraging the right pricing model such as Savings Plans, Spot Instances, and On demand to optimize costs based on the nature of workload; optimizing storage based on performance requirements such as Amazon Elastic Block Store (Amazon EBS) or using Amazon Simple Storage Service (Amazon S3) family for regulatory requirements; and mechanisms for optimization that help govern and manage these optimizations. The pillars of cost optimization are as shown in the following image.
Figure 1: Cost Optimizing Best Practices
Right sizing is the process of matching instance types and sizes to the workload performance and capacity requirements at the lowest possible cost. It’s also the process of looking at deployed instances and identifying opportunities to eliminate or downsize without compromising the capacity or other requirements. Telecom BSS components often have different instance type requirements. For example, Rating and charging applications that leverage enterprise-class and in-memory databases require different instance types than compute bound batch processing billing applications. CSPs will find the most appropriate mix of resources and instance sizes for their applications in AWS Cloud. This is true regardless of whether they’re looking for general purpose instances that can balance their compute, memory, and networking resources for Order Management and CRM applications; compute-optimized instances that provide high performance processing and computing for billing and mediation applications; or memory-optimized instances that are designed to process large data sets in memory for Policy and Charging applications.
Traditional on-premises telecom workloads are designed for peak usage. For example, mediation systems are designed to handle up to three days of backlogs of Call detailed Records (CDRs). Right sizing is often ignored by organizations when they first move to AWS Cloud. They lift and shift their environments and expect them to be the right size later. Speed and performance are often prioritized over cost, which results in oversized instances and a lot of wasted spend on unused resources.
Furthermore, it’s important to right size the DB machine for various environments, such as Development, Testing, Pre-Production, and Production, as each environment has different SLA and performance requirements. For example, in the US East region, for db.m4.2xlarge with bring your own enterprise license, CSPs can save $522.50 USD monthly by using a Single-AZ mode in development environment. Often, CSPs will have an environment to conduct specific tasks, such as performance testing or regression testing. After the testing is completed, the environment that includes Amazon Relational Database Service (Amazon RDS) instances remains underutilized until the next testing cycle resumes. CSPs can also use AWS Trusted Advisor to check for under-utilized Amazon RDS and stop these Amazon RDS instances.
Elasticity is the ability to acquire resources as applications need them and release resources when they’re no longer needed. BSS workloads require elasticity to enable real-time rating and not-so equally distributed bill cycle processing, as well as to handle an entirely new wave of digital services offered by 5G. For handling these requirements, CSPs should have BSS systems, such as Configure Price Quote (CPQ) and Order Manager (OM), to scale up instantaneously. These provide the necessary compute or storage resources to support peaks and make sure of an outstanding customer experience while handling 1000+ sites’ time sensitive and complex orders. The key is to release the resources when they’re no longer needed, thus reducing the costs.
To implement elasticity, CSPs should identify the variable workloads and their operational boundary. CSPs should implement AWS Autoscaling with services like Amazon Elastic Compute Cloud (Amazon EC2), as well as horizontal pod autoscaling and cluster autoscaling when working with containers. AWS Auto Scaling also supports scheduled scaling for the predictable load change needed for workloads, such as nightly development build, rating engine, billing calculation, or CDR processing. Once these workloads are completed, companies can shut down the running infrastructures to reduce cost. CSPs should test elasticity both up and down, thereby making sure that it will meet the requirements for load variance and iterate on implementation and testing until the requirements are met.
Leverage right pricing model
AWS has multiple pricing models that allow CSPs to pay for their resources in the most cost-effective way that also suits the organization’s needs. Picking the right pricing model is also key to cost saving. For example, if CSPs’ BSS applications are using a container or serverless architecture, then they can save up to 72% on Amazon EC2 and AWS Fargate (serverless compute engine for container) when they use AWS Savings Plan with a one- or three-year term. CSPs can also take advantage of Amazon EC2 Spot Instances (e.g., for billing applications) to save up to 90% of on-demand pricing for their fault-tolerant, stateless applications, such as next big offer or capacity planning. Furthermore, for database, by purchasing Reserved Instances (RIs) on Amazon RDS, Amazon Redshift, Amazon ElastiCache, and Amazon OpenSearch Service, CSPs can save up to 72% over equivalent on-demand capacity.
CSPs must Match Usage to Storage Class based on use case, data type, access patterns, and IOPS requirement. Let’s consider a few BSS use cases. Mediation systems are designed to handle up to three days of backlogs of Call detailed Records (CDRs). Order Capture application is designed to handle infrequent large and complex orders. Sometimes, CSPs must retain historical data for seven years due to regulatory requirements. Moreover, on-premises storage and servers are procured once in five years using the peak configuration. Amazon S3 offers a range of storage classes designed for different use cases. For example, by using Amazon S3 Standard, CSPs can store the last three months’ bill invoice PDF files. By using Amazon S3 lifecycle policies, CSPs can save up to 60% of Amazon S3 pricing by moving less frequently accessed data from Amazon S3 Standard to Amazon S3 Standard Infrequent Access (S3 Standard-IA). By moving data accessed on rare occasions (e.g., once per quarter) to Amazon S3 Glacier Instant Retrieval storage class, CSPs can save 68% on storage costs as compared to using Amazon S3 Standard-IA storage class. Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier) delivers up to 10% lower cost than Amazon S3 Glacier Instant Retrieval for archive data that is accessed 1-2 times per year and is retrieved asynchronously. Amazon S3 Glacier Deep Archive is Amazon S3’s lowest-cost storage class and supports long-term retention and digital preservation for data that may be accessed once or twice per year.
NFS-based solutions for container workloads mean that CSPs can use Amazon Elastic File System (Amazon EFS) to support APIs, webservers, and CI/CD use cases. Operators such as T-Mobile were able to save between 40-70% on storage cost when they no longer needed to overprovision NFS storage. EFS lifecycle Management policies also allow CSPs to automatically move files into the EFS Infrequent Access (EFS IA) storage class and save up to 85% as their access pattern’s change. This lifecycle feature can be enabled for all of the EFS file systems.
Measure, monitor, and improve
CSPs can use AWS tools, such as Amazon CloudWatch, AWS Cost Explorer, Trusted Advisor, and AWS Compute Optimizer, to evaluate costs, as well as monitor and analyze instance usage for right sizing. Using CloudWatch, CSPs can observe CPU utilization, network throughput, and disk I/O, and match the observed peak metrics to a new and cheaper instance type. AWS Cost Explorer analyzes cost and usage data to identify trends, cost drivers, and anomalies. Trusted Advisor provides real-time insights into service usage, identifying idle and underutilized resources and looking for opportunities to save money. Compute Optimizer also provides downsizing recommendations within or across instance families, upsizing recommendations to remove performance bottlenecks, and recommendations for EC2 instances that are part of an Auto Scaling group.
Furthermore, we’ve noticed additional factors in the field that CSPs can leverage for further cost savings.
Using Amazon Linux to save costs
To stop spending money on undifferentiated heavy lifting, CSPs can deploy Telecom BSS workloads on Amazon Linux 2. Amazon Linux 2 is a Linux server operating system from AWS. Amazon Linux 2 is optimized for use in Amazon EC2 with the latest and tuned Linux kernel version. It offers long-term support – a common need for many of CSPs and Independent Software Vendors (ISVs) for Telco-grade applications. Amazon Linux 2 is also provided at no additional charge.
The following chart shows the cost comparison among common Amazon EC2 instances (for a complete list of Amazon EC2 instance types, please follow this Link) used in multiple types of BSS workloads for the US East (N. Virginia) region for Amazon EC2 savings plan, Red Hat Enterprise Linux (RHEL), and SUSE Linux Enterprise Server (SLES). For example, on c6g.2xlarge, customers can save up to $109.50 per month for each virtual machine by switching to Amazon Linux 2 from SLES. A typical BSS workload, such as Billing and Order manager, contains up to 100+ virtual machines for all environments, thus saving up to $394,200 in three years.
Table 1: EC2 instance Pricing Model using Savings Plan
Managed Services to reduce costs
AWS offers 15+ purpose-built and fully-managed database engines, such as Amazon RDS and Amazon DynamoDB, to handle various BSS applications such as CPQ, Catalog, Order Manager, Billing, and Data Warehouse. Amazon RDS provides a selection of DB instance classes with varying combinations of CPU, memory, storage, and networking capacity.
AWS provides CSPs with Managed Services to help you ingest network data at scale, and move, analyze, and store the data. Amazon Managed Streaming for Apache Kafka (Amazon MSK) provides a path to migrate Kafka Streams applications to AWS Cloud. Amazon MSK provides scaling capabilities while eliminating the effort to self-manage Apache Kafka brokers and its associated components. To move to fully-managed databases and analytics services, AWS offers the Database Freedom program designed to assist customers migrate from legacy commercial databases, such as Oracle, to AWS databases. CSPs can leverage AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS) to move to Amazon RDS PostgreSQL, which provides all of the benefits of RDS Oracle without the Oracle license cost, or move to Amazon Aurora PostgreSQL which is three times faster than standard PostgreSQL databases. It provides the security, availability, and reliability of commercial databases at 1/10th of the cost.
Amazon Elastic Kubernetes Service (Amazon EKS) enables CSPs to run Scale Kubernetes-compliant OSS applications on AWS without the need to install and operate their own Kubernetes control plane. One Amazon EKS cluster can support up to 10 node groups, where each node group can support up to 100 nodes. This enables them to reduce the overall complexity of their BSS stack and limit the traditional control plane overhead associated with on-premises workloads. By using managed CI/CD pipeline services, such as AWS CodePipeline, CSPs can reduce complexity and manual errors within their DevOps process, which reduce development and maintenance costs.
In this post, we showed how to adopt cost optimization best practices to reduce the overall costs of BSS solutions. We showed how this can be achieved by using various pricing plans, right sizing environment, using DB Freedom program, and using tools such as Trusted Advisor, AWS Cost Explorer, AWS Cost and usage report, and Compute Optimizer.
To learn more about how telecommunications companies are leveraging AWS Services, visit Telecom on AWS.