The Great Migration: Chime’s Move to AWS
This blog post is co-authored in partnership with the Chime Engineering team.
Mike Barrett, VP of Engineering Services at Chime remembers when he first heard the term “undifferentiated heavy lifting“. The term was coined by Jeff Bezos, CEO of Amazon.com, in a presentation he gave in 2006 at MIT to describe all the hard IT work that companies do—like hosting, managing bandwidth, overseeing data center hardware, and general backend work—all of which don’t add value to the mission of the company. “By making every part of engineers’ jobs a little—or a lot—easier, Amazon Web Services (AWS) helps companies like Chime focus on their core competencies and deliver better products and services,” Barrett says.
In 2019, Chime took a look in the mirror to clearly define its core competencies as a financial technology company that provides services to partner banks that’s growing quickly. “When we started to look to the future and the growth we were anticipating—both of our number of members and our team—we recognized a pressing need for a solution that would abstract away our infrastructure requirements and let us focus on the work that matters to achieving our mission and serving our members,” says Barrett. So was recently completed in 2021. Here’s a look at the team and the work behind Chime’s great migration.
Meet the Team and Their Roles
Barrett joined Chime in early 2020 after watching a “Last Week Tonight” episode on the state of payday loans. In his role, he oversees Engineering Services, the part of Chime Engineering that empowers all of engineers to do what they love—write code and provide value to members—with less effort.
Ethan Erchinger, a Staff Software Engineer on the Engineering Services team, has been a Chimer since 2014. In the early days, he helped build out Chime’s production and data engineering infrastructure and has continued to work across Engineering Services with a focus on infrastructure ever since. Once a director and manager, Erchinger now focuses on providing consultative services across the Engineering organization.
Keerthi Nallani, a Senior Software Engineer, joined the Scale team in early 2019. She’s worked on scaling Chime’s infrastructure and improving application performance—a departure from her former days coding web application features. She loves removing bottlenecks and dependencies for fellow engineers and ensuring Chime can grow sustainably.
Kevin Olson was an Engineering Manager on the Infrastructure team at the time of the migration (he’s since become the founding Engineering Manager on the Financial Platform team). He joined Chime in 2020, a month before the pandemic occurred. His role was to understand the needs of the different parts of Chime’s Engineering organization for the AWS migration and build cross-functional relationships to support the migration. He loves empowering engineers to step into the roles they’re needed in and helping all teams be more effective.
Satya Palani is the Director of Infrastructure Engineering and Developer Experience at Chime. His main focus is on making sure that Chime is available to its members by building and maintaining its infrastructure. He partners with engineering teams to ensure they’re supported in their operations and have the tools and processes necessary to succeed. For the AWS migration, he managed one of the engineering teams that worked on it, helping them develop a solid rollout and rollback plan. He joined Chime team in August of 2020.
Making the Migration a Priority
In planning for the future of Chime—which would involve building a financial platform—they realized there were some missing pieces in their infrastructure to build the systems necessary for a successful platform. At the time, Olson was new to the company in a leadership role and started doing one-on-ones with managers. “I noticed that they’d often call out things that would be so much easier with the AWS migration complete—but there wasn’t a clear roadmap for when it would be done,” he says. At the time, Chime was hosted in a vendor-managed data center, there wasn’t a team dedicated to the migration, and it wasn’t a priority on the team’s roadmap.
As usage of Chime’s app and services was rapidly growing, they realized that scaling their infrastructure in the data center at the same speed as their member base would be impossible. They also wanted to ensure that their infrastructure was as resilient as possible, because availability is critical for their members. Olson says, “If Chime goes down, members can’t access their money—that’s a core function of Chime and we needed to make it a priority.”
At that time, the migration to AWS was made a company-wide priority to provide scalability and self-service for all of Chime Engineering. “We have a team that lives and breathes our product, but we needed people who would live and breathe the tools that make building, deploying, and supporting that product easy—all of which results in better services for our members,” Barrett says. “When we decided to complete the AWS migration, we were committing to making that a priority and a reality.”
Gain the ability to rapidly scale to provide better service for members with uptime, speed, and reliability. The vast and dynamic capacity of the AWS Cloud allows Chime to provision new infrastructure at a moment’s notice and provides more geographic resilience than traditional data centers.
A seamless migration for members. “A primary requirement for our migration was that our members do not experience any interruptions,” says Palani. Protecting data integrity, designing a seamless cutover process, and performing extensive pre- and post-migration testing were some of the critical components of the migration plan.
“By being able to quickly scale our infrastructure, we ensure that Chime’s members always have the best possible experience.” – Satya Palani, Director of Infrastructure Engineering and Developer Experience at Chime
Improve engineering productivity. Migrating to AWS is a lever that makes everything else easier—if every engineer can be 10–20% more efficient, that’s a massive impact at scale. What’s more, the public cloud makes it easy to treat infrastructure as software, which enables engineers to set up environments really quickly and gives Chime access to the many managed services that AWS provides, reducing the human cost of setting up and operating common services like databases and caching layers.
The first step was to prioritize the project ruthlessly and then get the team in place. The team was split into smaller groups based on concentration. “We ensured that they were teams, not just single individuals, so we could all share knowledge,” Barrett explains. Different groups focused on things like the Redis migration, database migration, Kubernetes, and CI/CD flow. This helped teams focus and become experts in their areas of the project. It also gave clarity on who to contact for questions and built accountability and responsibility across the teams.
Any large technical project is never without its challenges. Here are a few that Chime faced in their migration to AWS:
Company growth: From start to finish of the migration, Chime grew by up to 20x—their data, customers, and number of employees. The company went from two engineering managers to 25! “It was a constantly shifting landscape and a challenge to keep the business running—and growing—in our legacy software while we evaluated new technologies,” says Erchinger.
Coordination: Changing a car’s engine while it’s in motion is no small feat, and the team underestimated the interconnectedness of all of the systems at first. “There were terabytes of data to migrate, and we needed air traffic control despite our very cautious approach,” says Olson. The team coordinated by reverse-engineering the system to understand how everything was connected, with a detailed battle plan and steps to follow. Then they focused on migrating 1–2 services at a time to see what would happen.
Moving solutions: “We’d grown deep roots deploying in a data center, so moving solutions wasn’t easy,” says Erchinger. For example, when they moved to Kubernetes (from on host deployment to dockerized based deployment), the team spent five months moving from one caching solution to another and understanding how to support the multi-Region consistency needed for the move. They took care in moving solutions, like caching, because they want our members’ experience to be consistent and reliable—if they were wrong about where data was written and how replication was happening, they might lose consistency and, for example, show a member an incorrect balance.
Making tradeoffs: “As an organization, we had to punt some features and optimizations until AWS was available for Chime Engineering,” explains Nallani. “This felt frustrating at times, but it was necessary for us to have all hands on deck and to not further complicate the migration process.”
“Migrations like these are bound to have downtime,” says Nallani. “But we had zero downtime—it was both amazing and anticlimactic.” The reason why? Instead of doing a post-mortem, the team did a pre-mortem to understand, beforehand, what might go wrong. Usually, doing a migration of this magnitude would involve shutting down the system, but we were able to get the dual write latency down to milliseconds—which is how we achieved zero downtime.
The Future of Engineering on AWS at Chime
Migrating to AWS means several things for engineering at Chime. “The migration empowers our team to be less tied to the past and think more about the future,” says Olson. “With AWS, if we want to deploy a new service or architecture, we can do it in hours or days rather than months of work orders.” AWS also gives them visibility into the system’s health with a plethora of logs and metrics to dig into in AWS that will help them scale.
Building on AWS also means supported services. From databases to caching, messaging, deployment infra, and more, it’s much easier to do a ton of things without in-house expertise because AWS offers a broad range of services. Finally, Chime’s migration to AWS removes infrastructure engineering as a bottleneck. Being a self-service model removes the pressure on the infrastructure team to provision and maintain systems. Instead of teams having to partner with infrastructure to build a new tool or service, developers have greater control to build and manage infrastructure through AWS automation, security, and templates. “AWS unlocks our ability to move projects forward and makes scaling easier on our infrastructure and engineering organization in general thanks to the breadth of services it gives us access to,” says Erchinger.
Perhaps most importantly, the AWS migration helps us provide a better Chime for our members. A guiding value at Chime is to be member-obsessed, so everything we do is ultimately to serve them. The migration to AWS will help us provide better services, more uptime, reliable data, security, and a sound foundation for us to build the future of financial health upon.