Reddit Migrates to Managed Amazon Aurora to Scale for 30% Year-over-Year Growth
Social media company Reddit was experiencing hyperscale growth, with its monthly active users surging 30 percent year over year. Consequently, a substantial operational burden associated with self-managing its relational database was constraining Reddit’s team and putting quality service delivery to its users at risk. To take pressure off its team, facilitate future growth, and meet its users’ needs, Reddit had two options: either invest in more resources and hire more engineers or migrate to a managed relational database service.
The Reddit team chose to migrate key workloads from a self-managed PostgreSQL database to the Amazon Web Services (AWS) service Amazon Aurora, a MySQL- and PostgreSQL-compatible managed relational database service built for the cloud. Aurora combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open-source databases. For Reddit, the migration resulted in higher database reliability, timelier backup restoration and point-in-time recovery, and fast automated failovers, with failovers taking around 30 seconds. Most notably, by automating administrative tasks, the migration saved the team roughly 2 business days’ worth of time per month, freeing it to focus on higher-value tasks and future projects.
Using Aurora, we could build a longer-term strategy for how we were dealing with data and do more strategic thinking for long-term projects.”
Principal Engineer, Reddit
Finding a Solution for Operational Burdens
Reddit has been operating much of its infrastructure on AWS since 2009. Originally, to support data that powered Reddit.com and the company’s native mobile apps, Reddit used a self-managed PostgreSQL database on Amazon Elastic Compute Cloud (Amazon EC2), which provides secure, resizable compute capacity in the cloud, and Amazon Simple Storage Service (Amazon S3), an object storage service that offers industry-leading scalability, data availability, security, and performance. Amazon EC2 and Amazon S3 gave Reddit’s engineers complete control of their computing resources as well as easy-to-use management features that enabled them to organize their data and configure finely tuned access controls to meet business, organizational, and compliance requirements.
But as Reddit started to grow rapidly—in 2019 reporting a 30 percent year-over-year increase in monthly active users to 430 million—managing its own computing resources in Amazon EC2 and Amazon S3 began to present challenges. “I was spending a lot of time managing Amazon EC2 PostgreSQL instances, which prevented me from working on larger-scope efforts,” explains Jason Harvey, a principal engineer for Reddit. “We have a huge amount of data. If the instances go down, it takes a long time to do restores.” Though self-managing a database provides granular control over every aspect of operations, it requires highly skilled database administrators who spend all their time operating and troubleshooting the database, from configuring the software and installing patches and upgrades to investigating and resolving bugs, scaling the deployment for growth, and more. A managed service, in contrast, handles those tasks for users.
To relieve the operational burden on its engineers, Reddit looked to Aurora, managed by Amazon Relational Database Service (Amazon RDS)—which automates time-consuming administration tasks such as hardware provisioning and database setup, patching, and backups. In particular, Reddit’s engineers were interested in Aurora’s managed failovers, read replicas, and sharding capabilities. “We were looking for something that could take some of the operational burden of managing those instances off our engineers,” says Harvey. “Using Aurora, we could build a longer-term strategy for how we were dealing with data and do more strategic thinking for long-term projects.”
Migrating to a Managed Database
Reddit began its migration in January 2020. The company needed to move enormous amounts of data across 24 PostgreSQL databases on Amazon EC2 to Aurora using logical replication. The sheer scale of the data migration was daunting initially. Reddit was founded in 2005 and today ranks as the 17th most visited website in the world. “We were moving every comment, every link, and every account that has ever existed on the site,” says Harvey. However, Aurora’s native logical replication enabled Reddit’s engineers to replicate its databases from Amazon EC2 without the need for transformation. As a result, the migration went smoothly. “The native logical replication enabled us to replicate the databases from Amazon EC2 without having to transform them at all, which would have required more effort,” says Harvey.
Once the engineers replicated Reddit’s databases, they were able to test the site’s data in production by running its applications directly on Aurora for reads to validate that the database was functional and sufficiently high performing. They then performed a write cutover, stopping PostgreSQL from running on Amazon EC2 and beginning to run Aurora. Since Amazon EC2 and Aurora use the same application programming interface, Reddit’s cutover produced no downtime. “That is just a benefit of having the exact same solution in terms of the application’s point of view,” says Harvey. “So that made the migration quite a bit easier.”
Since completing the migration in July 2020, Reddit’s engineers are able to make better use of their time. On Aurora, they don’t have to worry about instance retirement, which means fewer manual failovers. Aurora transparently recovers from physical storage failures, and automated failovers now take around 30 seconds. Accordingly, Harvey expects to see a decreased amount of downtime as a result of instance failovers.
As another benefit, Aurora has enabled Reddit’s engineers to speed up backup restoration and point-in-time recovery. Harvey says that to restore deleted data with the new system, he can use point-in-time recovery to set up a new cluster, fetch that data, and restore it quickly. “This is something we were able to do under our old solution, but the new system makes it much easier. It’s more of a click of a button now in comparison to doing a bunch of manual moving using Amazon EC2.” Finally, Aurora enables Reddit’s engineers to use data clones to more easily make data available to other teams for analytics and experimentation purposes. “In the past, if another team wanted to use data in the production databases, it had to launch it and manage its own Amazon EC2 instances, which it would frequently not have the time to do,” Harvey explains. “Since we moved to Aurora, those teams can now clone the databases and use them as needed without having to deal with the operational burden.”
Achieving Unhindered Scalability
Migrating from self-managed PostgreSQL on Amazon EC2 to Aurora was a key component of Reddit’s success. It has resulted in not only higher database reliability, faster backup restoration and point-in-time recovery, and fast automated failovers but also improved productivity for engineers. Now, Harvey and his team are able to focus more on high-value, long-term goals such as revisiting their data storage strategy and less on administrative tasks. And as Reddit’s relational workloads are now on Aurora—the company’s target database for all net-new relational workloads—it can continue to scale at the high 30 percent year-over-year rate it’s seeing without sacrificing the quality of services and experiences it provides to its hundreds of millions of monthly active users.
Founded in 2005 by two college friends, Reddit is one of the most-visited social media websites in the world. Its 430 million global monthly active users share interests, news and entertainment stories, and more in 130,000 active communities generating over 30 billion monthly views.
Benefits of AWS
- Scaled database-management capabilities to support 30% year-over-year growth
- Reduced operational burdens on engineers and increased their productivity
- Improved database reliability
- Sped up backup restoration and point-in-time recovery
- Enabled fast automated failovers taking around 30 seconds
AWS Services Used
Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud, that combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases.
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.