PBS speeds deployment and reduces costs with AWS Fargate

This blog post was co-authored by Mike Norton – VP Cloud Services & Operations, PBS, Warrick St. Jean – Sr. Director Solution Architect, PBS, and Brian Link – Director, Technical Operations, PBS

Introduction

PBS is a private, nonprofit corporation, founded in 1969, whose members are America’s public TV stations. They have been an AWS customer for over 10 years using around 100 services. This post about PBS’s success using Amazon Elastic Container Service (Amazon ECS) and AWS Fargate. This post covers their 10-year journey in the cloud. Also, we’ll cover how PBS evolved to use Amazon ECS and AWS Fargate to optimize their resilience, scalability, cost, and application development.

Solution overview

Opportunity

In 2009, the PBS cloud journey began by utilizing Amazon Elastic Compute Cloud (Amazon EC2) and RightScale to address their infrastructure requirements. As they progressed, PBS incorporated AWS OpsWorks, using Chef 12 and its recipes. AWS OpsWorks proved to be a useful tool for certain workloads, which allowed PBS to manage their infrastructure effectively.

However, as PBS’s needs evolved, they encountered specific use cases where AWS OpsWorks was not the best fit. For instance, they experienced challenges with scaling during sudden, unpredicted increases in demand, especially for breaking news or unforeseen events. Additionally, load-based instances sometimes scaled slowly due to Chef 12 recipe execution during instance startup.

Despite these challenges, the PBS team demonstrated its adaptability by creating a base Amazon Machine Image (AMI) for their web application and database. This initiative significantly reduced the time to deploy recipes during production scaling events, which allowed them to address scaling challenges more efficiently.

Furthermore, PBS occasionally had to overprovision resources to ensure service levels were maintained during peak hours across the country. This experience provided them with valuable insights into managing and planning for high-traffic periods.

While AWS OpsWorks facilitated PBS’s system upgrades, they observed that transitioning from Ubuntu 14 to Ubuntu 16 for each individual application could be time-consuming. Nevertheless, this process enhanced their ability to handle operating system transitions, further preparing them for future challenges.

Regarding security practices, PBS’s AWS OpsWorks initial design involved storing secrets in code on the server and a Git repository accessible to Ops personnel. While this approach may not align perfectly with current security best practices, it did help PBS recognize the importance of robust security measures.

Despite these considerations, PBS’s experience with AWS OpsWorks was instrumental in shaping their learning journey. It encouraged their team to innovate, adapt, and improve their infrastructure management capabilities in the cloud. While OpsWorks may not have been the ideal fit for all of PBS’s use cases, they deeply appreciate the valuable experiences and insights it provided throughout their cloud journey.

PBS Digital AWS OpsWorks - Representative Legacy Architecture

Migration

Their adoption of containers in 2017 had a big impact on their migration to serverless. Firstly, they had a significant number of software engineers with existing codebases who needed to familiarize themselves with the concept of going serverless. This meant transitioning from the traditional approach of building and running services on Amazon Elastic Compute Cloud (Amazon EC2) to a completely different paradigm.

The initial motivation stemmed from the developers’ experiences. The aim was to enable new developers joining the team to quickly onboard and get applications running on their laptops. By utilizing Docker, this onboarding process became much faster and more streamlined. Instead of grappling with missing libraries or tools, developers simply needed to install their preferred Integrated development environment, Docker, and execute docker compose up. Instantly, the application would be running.

AWS Fargate wasn’t yet available as a viable option during that period. Hence, their proposed solution focused on utilizing Amazon ECS on Amazon EC2 and creating a consistent approach. They began with the idea of a single cluster of Amazon ECS using Amazon EC2 instances for production, staging, and development environments. Within this cluster, they planned to have multiple container instances accommodating around 30 applications.

However, one challenge they initially encountered was the lack of a built-in auto-scaling feature for Amazon ECS container instance hosts at that time. As a result, they had to craft their own solution to this challenge.

When it came to upgrades, Amazon ECS on Amazon EC2 provided a more streamlined experience compared to their previous use of AWS OpsWorks. Upgrades primarily involved updating the autoscaling group to utilize the latest Amazon Linux amazon machine image or Ubuntu AMI, followed by relaunching the instances sequentially. This upgrade process was further simplified with the advent of the Refresh Instances button within autoscaling groups, which allowed for gradual instance replacements.

Consequently, the team effectively tackled their scaling issues and acquired the capability to scale more swiftly, accommodating a variety of irregular traffic patterns. The shift to containers reduced their day-to-day administrative overhead. Containerized applications turned out to be more stable, leading to fewer emergency situations. The team didn’t find it necessary to expand or enhance capacity to handle the growing workload.

Outcome

Migrating to Amazon ECS brought about a dramatic improvement in the team’s experience during major premieres. Previously challenging aspects became non-issues with the adoption of Amazon’s event management capabilities. A particularly beneficial outcome was the harmonization of developer environments, staging environments, and production environments, all of which embraced a Docker-based approach.

The team also harvested benefits when it came to periodic tasks that required access to the codebase. Previously, they had to maintain an additional instance in Amazon EC2 and AWS OpsWorks for this task. However, with the introduction of scheduled tasks in Amazon ECS, they achieved the same objective with reduced overhead. Their storage management underwent significant enhancements. Initial challenges with persistent storage were addressed, which allowed applications to be relaunched with mounted storage. The team further advised developers to leverage Amazon Simple Storage Service (Amazon S3) for its durability, reliability, lifecycle management, and cost savings. This resulted in a notable reduction in development time, and they fortified security measures by using AWS Secrets Manager to securely house their secrets.

PBS Digital Amazon ECS - AWS Fargate - Representative Reference Architecture

A core principle the team upheld was viewing containers as cattle, rather than pets. This minimized the necessity for continuous adjustments and fine-tuning. The transition to AWS Fargate offered the advantage of bypassing extensive repair efforts on malfunctioning containers. Nonetheless, on the few occasions where intervention became essential, Amazon ECS exec proved to be an invaluable tool.

The team also underwent a more nuanced migration from Amazon ECS on Amazon EC2 to Amazon ECS on AWS Fargate, which brought its own suite of benefits. With AWS Fargate, concerns about AMI upgrades for the underlying Amazon EC2 instances became obsolete.

Scaling with AWS Fargate was both swifter and more straightforward, which ended the worries of overprovisioning. The team was freed from initiating large Amazon EC2 instances just to accommodate a handful of additional containers. AWS Fargate’s design allowed them to allocate resources with precision, and the introduction of savings plans added more versatility, letting them commit to a fixed spending rate per hour, independent of container size or architecture. As they continued their journey, they found that Amazon ECS on AWS Fargate presented comprehensive answers to their challenges and emerged as a more user-centric option. Its integration streamlined their operations and greatly elevated their overall experience.

Mike Norton, VP Cloud Services & Operations

Mike Norton is the VP of Cloud Services and Operations at PBS. His team is responsible for operations, architecture, and governance of PBS’ cloud workloads.

Warrick St. Jean – Sr. Director Solution Architect

Warrick is a solutions development and digital transformation leader within PBS Digital & Marketing business unit where he’s focused on delivering high-quality solutions to stakeholders across the org.

Brian Link – Director, Technical Operations

Brian Link is the Director of Technical Operations at PBS Digital. His team collectively builds AWS infrastructure that powers dozens of websites, apps, and large-scale video streaming platforms to serve public media content to PBS viewers around the country.

Containers