Paytm Modernizes Data Platform and Streamlines Data Processing Using Amazon EMR
“Our goal is help people gain control of their finances, whether it’s providing loans to businesses, digital savings accounts for the underbanked, or payment methods for neighbourhood stores. We want to be the go-to financial services platform in India,” says Manmeet Dhody, chief technology officer of payments at Paytm.
Vice President & Head of Data Platform, Paytm
Future Proofing the Data Platform
Paytm’s business units and the merchants on its platform rely on the insights derived by the company’s big data platform—such as user adoption, sales, revenue generated—to inform business decisions. “As our data volumes grew, we needed a platform that could handle larger data workloads and provide our merchants as well as our product and business teams with the right data at the right time,” says Kumar.
To that end, Kumar and his team turned to Amazon Web Services (AWS) to rearchitect its on-premises platform on the cloud. With AWS, the team found that they could leverage AWS managed service solutions—allowing them to spend less time on running the on-premises platform—and build a solid foundation for a modern data lake to improve its data infrastructure further.
Migrating the Data Platform to Amazon EMR
To address these challenges, Paytm’s data engineering team adopted Amazon EMR, a big data platform, to rearchitect its core ETL processing with low operational overheads.
Amazon EMR’s compatibility with Paytm’s pre-existing open source tools made it easy to set up, operate, and scale the company’s big data platform and integrate with its other machine learning and artificial intelligence stack.
With Amazon EMR, Paytm can now securely process and hyperscale data workloads with ease—the platform can spin up big data clusters and execute most of Paytm’s core ETL processing in as little as 10 minutes, down from up to 12 hours previously. Additionally, it can shut down clusters when they are no longer needed, minimizing unnecessary infrastructure costs.
“Amazon EMR provided us with exactly the tools and features we needed to build a futureproof data platform. As the provisioning of capacity and scaling of clusters is managed by Amazon EMR, we can now deliver data to our business users 30 percent faster and at 70 percent the cost of our on-premises solutions. Even more so, AWS supported our motivated data engineering team to expedite and complete this complex and critical system migration in record time—in under 45 days, instead of a few quarters.” says Kumar.
Empowering the Data Team
“Rearchitecting our platform on Amazon EMR empowered our data engineering team to move away from resolving platform incidences and focus more on Patym’s core business. In fact, we are already thinking about making further improvements with AWS. With our data volumes expected to grow quickly, our next step is to integrate all our data into a single data lake, allowing us to maintain the quality of our data in line with business growth,” concludes Kumar.
Benefits of AWS
- Reduced infrastructure management and processing incidents by 70 percent
- Streamlined data processing time for majority workloads by 98 percent
- Improved data availability by 30 percent
- Reduced data infrastructure cost by 30 percent
AWS Services Used
Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto.
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.