Flipboard Boosts Data Performance by 300% with Amazon EMR
Executive Summary
Flipboard, a community-focused digital media curation platform, uses several open-source applications for critical parts of its data architecture, from data processing to querying. Running its infrastructure on Amazon Web Services (AWS) has allowed the company to grow over the years. However, leadership wanted to modernize its data systems further to be future-ready and reduce the time spent upgrading its open-source tools. With the help of data solutions provider Mactores, Flipboard selected Amazon EMR backed by Amazon Simple Storage Service (S3) to scale more efficiently, increase stability, enhance performance, and cut its technical debt nearly in half.
Legacy Software Posed Growth Challenges
Serving more than 100 million active users seeking curated content about every subject imaginable, Flipboard’s data environment sees up to 300,000 reads per second and 120,000 writes per second. While the company is built on AWS, much of its environment relies on an open-source framework and applications, such as Apache HBase, to collect, process, analyze, and manage big data. Initially storing its data in Amazon Elastic Block Store (Amazon EBS) volumes local to Amazon Elastic Cloud Compute (Amazon EC2) instances, it also used a third-party data lake platform. But working with five-year-old versions of software and multiple data stores that ran on aging hardware created time-consuming hurdles for the engineering team.
“We wanted to upgrade to something more resilient and take much of the configuration management that open-source technologies require off our hands. But the path to do that was unclear,” said Cris Favero, data engineering manager at Flipboard. “We lacked the expertise to design and architect it, and even if we had it, we didn’t have the bandwidth to deploy it.”
"We’re more resilient, and Flipboard’s overall user engagement has improved as a direct result of this migration.”
- Greg Scallan, VP of Engineering, Flipboard
Automation Reduces Time Spent on Configuration by 70%
AWS recommended Mactores, an AWS Advanced Consulting Partner specializing in data analytics and machine learning solutions. Following a detailed assessment of Flipboard’s core technology and a proof-of-concept, they agreed on a migration plan. Mactores would move Flipboard from its open-source hybrid data platform running on Amazon EC2 and host Apache HBase workloads on Amazon EMR backed by Amazon Simple Storage Service (Amazon S3). It would also switch from Flipboard’s older data preparation and transformation platform to Apache Hive on Spark powered by Amazon EMR.
Flipboard chose Amazon EMR for four primary reasons. First, it offered storage and compute separation backed by Amazon S3, an object store. This would allow them to scale their compute while scaling storage independently. Second, Amazon EMR leverages automation for the configuration management that Flipboard’s open-source tools require. Third, Amazon EMR would automatically upgrade workloads to the latest version, enhancing system reliability. Finally, Amazon EMR uses the latest processors like Graviton, which would let Flipboard seamlessly benefit from newer hardware and more computer power.
“Amazon EMR is an out-of-the-box solution that quickly updates clusters without much manual effort,” said Balkrishna Heroor, senior principal consultant at Mactores. “It’s much simpler for Flipboard to upgrade its clusters to the newest HBase versions because the Amazon EMR platform already does most configuration changes and the S3 integration.”
Strong Apache HBase Regions Recovery Yields 30% More Resilience at Half the Cost
Running Amazon EMR on Amazon S3 improves Flipboard's resiliency and HBase region recovery. If there is region corruption, Flipboard can rapidly add new nodes to bring up new instances and start caching data immediately. Amazon EMR can load objects into Apache HBase regions if the primary region servers go down.
“Amazon EMR backed by S3 offers an Apache HBase region recovery solution without having to buy expensive products,” Heroor said. “By separating S3 object storage from the storage compute, we can rapidly upgrade EBS volumes when we need higher capacity with faster throughput, thoroughly modernizing Flipboard’s HBase region availability strategy.”
Now, not only is managing these types of instances easier for Flipboard, but it also needs fewer of them. As a result of less time spent on upgrades, maintenance, workload optimization, and the benefits of AWS technology, Mactores reduced Flipboard's technical debt by 40%.
"Amazon EMR backed by S3 offers an Apache HBase region recovery solution without having to buy expensive products.”
- Balkrishna Heroor, Sr. Principal Consultant, Mactores
300% Performance Improvement Drives User Engagement
The re-architecture has impacted both Flipboard’s users and its development teams. Since Amazon EMR has been used to store all of Flipboard’s Apache HBase data—storyboards, magazines, comments, and likes—users can enjoy real-time, reliable access to data.
Because Flipboard can manage its data more efficiently, the content that its users see is more relevant and accurate. Overall, the modernized architecture that Mactores helped Flipboard achieve using Amazon EMR and S3 has improved data performance by 300%.
With Amazon EMR backed by S3, many previous development barriers are gone, facilitating faster production. For example, the engineering team can “spin up a development cluster and start testing, iterating, and scaling within 30 minutes—without devoting significant resources to configuration and file management.
“We’re more resilient, and Flipboard’s overall user engagement has improved as a direct result of this migration,” said Greg Scallan, vice president of engineering at Flipboard. “The user experience with the app is faster because the API responses are quicker, and we can deliver new features more quickly.”
About Flipboard
Flipboard is an award-winning curation platform where people go to stay informed and inspired. Every day, millions of people use Flipboard to discover articles, videos, podcasts, and products curated to their interests.
About Mactores
Mactores is a trusted leader among businesses in providing modern data platform solutions. Since 2008, Mactores has enabled businesses to accelerate their value through automation.
Published October 2022