How Coinbase Built a Cloud Center of Excellence to Optimize their Cloud Costs on AWS
Dr. Adam Link, Engineering Manager, Coinbase Cloud Center of Excellence and Schalk Theron, Director of Engineering, Coinbase Cloud Foundations contributed to this blog.
Introduction and Challenges
Coinbase is a secure online platform for buying, selling, transferring, and storing cryptocurrency. Their mission is to create an open financial system for the world and to be the leading global brand for helping people convert crypto into and out of their local currency.
In 2022, Coinbase sought to optimize their cloud computing costs as part of its scaling strategy to support the next 1 billion users globally. Coinbase launched a strategic initiative internally with a goal to achieve a reduction in costs in six months across their cloud vendors, including AWS, through financial and technical optimizations. This optimization would allow Coinbase to reinvest back into the business and innovate in new areas for their customers, such as increased reliability, lower latency, and increased localization of Coinbase as it grows its global footprint.
Defining Business Goals
The Coinbase Director of Infrastructure, Schalk Theron, led the project to ensure executive alignment across the company. He started by implementing spending guardrails to reduce inefficiencies and track optimization progress each week until the targets were met. The team identified the need for product level cost visibility and developed internal tools, such as an Amazon QuickSight dashboard, to provide deeper insight into cloud spending patterns.
A representative named a Product Group Cost DRI was empowered in each business unit to achieve the specific cost reduction goal assigned to their unit, and each of these representatives worked directly with a newly formed Coinbase Cloud Center of Excellence (CCoE). This combination of central and decentralized approaches enabled the reduction in cloud spending.
Evolution of the Coinbase CCoE
During the implementation of its cost reduction project, Coinbase realized that its use of AWS resources was based on outdated best practices given how long it had been operating. Recognizing outdated architecture as technical debt, which impacted its ability to ship the highest quality products quickly, Coinbase formed a CCoE with a mission to align Coinbase’s infrastructure with the latest guidelines from the AWS Well-Architected Framework.
The CCoE, led by Engineering Manager Dr. Adam Link, addressed Coinbase-wide cost optimization. Each business unit within Coinbase was assigned a dedicated CCoE engineer, typically a seasoned senior or staff-level software engineer with a strong cloud background, that worked to understand their particular workloads and apply the AWS Well-Architected Framework. Although the Cost Optimization pillar of the Well-Architected Framework was crucial to success, other pillars such as Performance Efficiency played an essential role. Coinbase’s CCoE focused on translating AWS cost guidance into actionable steps that product teams could implement and lead to overall efficiency.
The CCoE played a crucial role in driving change within the organizations by producing code and configuration artifacts for the product teams and providing explanations of specific changes and their impacts. They collaborated with the product teams on implementation and testing to ensure that Coinbase’s key services continued to function as expected. They also built tooling and made suggestions for centralized cost optimization, including Compute Savings Plans, Amazon Elastic Compute Cloud (Amazon EC2) Reserved Instances, and Amazon EC2 Capacity Reservations. While individual teams were able to achieve their own successes, the CCoE team, in partnership with their AWS Account Team, was able to assist them along the way and make centralized changes to infrastructure, resulting in new cost-optimized standards for Coinbase services as a whole.
Meeting Structure and Cadence
Coinbase divided the project’s tactical components into three phases: kick-off, easy wins, and sustained performance. During the kick-off phase, the CCoE and Coinbase’s AWS Account Team collaborated to identify technical cost savings through Amazon EC2 rightsizing, Amazon Relational Database (Amazon RDS) reconfiguration, and optimization of AWS managed services. In the easy wins phase, the CCoE partnered with AWS Professional Services and implemented centralized configuration changes that impacted the entire infrastructure. Lastly, during the sustained performance phase, AWS Professional Services and the CCoE worked with individual Coinbase business units to optimize application configurations for the cloud.
Coinbase’s AWS Account Team partnered with the CCoE to implement several optimizations early on and AWS brought in Specialist Solutions Architects for AWS services like Amazon EC2, Amazon RDS, Amazon ElastiCache, Amazon CloudWatch, Amazon Simple Storage Service (Amazon S3), and Amazon Managed Streaming for Apache Kafka (Amazon MSK) to achieve even greater company-wide cost optimization in the sustained performance phase.
Leveraging AWS Cost Optimization Services
Coinbase used many AWS tools for cost optimization to track and analyze its costs and build an optimization strategy. AWS Trusted Advisor was used to provide valuable recommendations for cost optimization and rightsizing across all AWS accounts. Through AWS Trusted Advisor Organizational View, Coinbase was able to easily gain insights across all of their AWS accounts. Trusted Advisor provided actionable suggestions for services like Amazon EC2, Amazon S3, Amazon Elastic Block Store (Amazon EBS), Amazon Elastic Load Balancing (Amazon ELB), and more.
Coinbase used AWS Compute Optimizer to provide an initial set of recommendations for Amazon EC2 rightsizing, which the CCoE used first to establish company-wide Amazon EC2 guidelines. Further service-specific recommendations were then carried out by Coinbase’s business units, guided by the CCoE.
AWS Cost Explorer helped Coinbase’s finance and CCoE teams to answer complex budget questions and address queries about cost anomalies and service-level trends. To provide a more Coinbase-specific and easily understood view of service costs for its engineers, AWS Professional Services implemented an Amazon QuickSight dashboard, leveraging Amazon Athena, to process the AWS Cost and Usage Report (CUR). This custom dashboard enabled Coinbase engineers to approach the cost optimization program effectively and use cost allocation tagging to identify workload-specific trends and patterns.
These combined efforts resulted in significant cost optimizations across various AWS services.
For Amazon EC2, instance types were optimized to match Coinbase’s workloads, leading to better price to performance ratios. Coinbase re-engineered their tooling to become a multi-architecture company that can run workloads on Intel and Arm processors, with 25% of the Coinbase fleet running on AWS Graviton instances today. Additionally, Amazon CloudWatch Detailed Monitoring was enabled on certain EC2 instances to fine-tune rightsizing, autoscaling, and new instance type selections.
For Amazon RDS, single instances were migrated to Amazon Aurora instances, leveraging Amazon Aurora Auto Scaling and enabling Coinbase to utilize scale-down capabilities during low-load periods. Coinbase was also able to migrate their Amazon RDS instances to take advantage of the price-performance advantages of AWS Graviton processors.
For Amazon S3, Coinbase used lifecycle policies to migrate objects to different storage tiers based on their access frequency to reduce storage costs. It enabled Amazon S3 Intelligent-Tiering as the default storage option across the company, allowing objects to be automatically stored in appropriate storage classes for their expected access patterns over time — ensuring cost-effective storage without additional overhead.
For Amazon MSK, AWS Professional Services helped to horizontally scale in and vertically scale down Coinbase’s Amazon Managed Service for Kafka (MSK).
Additional cost savings were achieved through Amazon CloudWatch and Amazon CloudTrail rightsizing; rightsizing and upgrading Amazon ElastiCache clusters for improved Redis performance; Amazon EBS volume optimization by standardizing on GP3 volumes; and switching to Amazon DynamoDB Provisioned Capacity for high-IO workloads.
Coinbase was successful in achieving a significant reduction in cloud costs through a strategic collaboration with AWS and by leveraging the model of a CCoE. Within a six-month period, Coinbase was able to optimize their business and implement best practices to prepare for expected future growth. The process demonstrated that strategic cloud spend management can provide a competitive advantage — allowing for reinvestment of savings into the business and opening doors to innovation and new products for their customers. With cost savings realized, and an updated cloud architecture consistent with the latest AWS Well-Architected Framework, Coinbase can now expand globally to reach its next 1 billion users with the necessary reliability and scalability.