Simple Change Cuts Mercado Libre Compute Costs 31% Without Hurting Performance
One way that companies using Amazon Web Services (AWS) can reduce their spending is by taking advantage of Amazon Elastic Compute Cloud (EC2) Spot Instances, which offer unused AWS compute capacity at a significantly lower cost than other Amazon EC2 instance types. This was the possibility that led Mercado Libre—an e-commerce platform with 211 million registered users throughout 18 Latin American countries—to look at the possibility of incorporating Amazon EC2 Spot Instances into its hybrid architecture.
The result: 20 percent in savings the first month, with no impact on developer experience or business trajectory.
“In the first month, we saved 20 percent by moving to Amazon EC2 Spot Instances.”
– Gabriel Eisbruch, Senior Manager of Architecture, Infrastructure, and DBA Teams, Mercado Libre
About Mercado Libre
AWS Services Used
About Mercado Libre
Mercado Libre, an e-commerce platform in Latin America, serves more than 200 million users in 18 countries. The platform enables its users to buy and sell products through auctions or at fixed prices, make secure payments, and access credit.
- Saved 20% in first month
- Saving 31% monthly
- Shifted 30% of Reserved and On-Demand Instances to Spot Instances in one week
- Saved 20% in first month
AWS Services Used
Enabling Sustainable Growth
Mercado Libre was looking for ways to cut the cost of its Amazon EC2 fleet without affecting development and deployment. The company was running its service layer and microservices on AWS—underneath about 3,000 applications— and the resulting speed and agility had contributed to some remarkable successes.
From 2016 to 2017, the volume of items sold on the Mercado Libre platform jumped 74 percent and net revenues climbed above $13 million, a 66 percent increase. The year 2017 also saw Mercado Libre replace Yahoo on the NASDAQ 100 and receive recognition from the MIT Technology Review as one of "the 50 companies that best combine innovative technology with an effective business model."
To sustain this growth, the company wants developers to focus on products and features. "Our development team makes about 1,700 deployments daily," says Dario Simonassi, who was then leading the infrastructure team at Mercado Libre. "We want our developers to be able to stay focused on making great products, so we don't ask them to think about resources except in terms of how much memory their applications need."
Resource considerations fall instead to the company’s infrastructure team. In early 2018, that team was taking a closer look at the company's use of Amazon EC2. "The problem was that our AWS use was growing so fast that we would exceed our budget unless we could reduce the cost of our Amazon EC2 fleet," Simonassi says. "But we had to do so without disrupting the developer experience or holding the business back."
The team began evaluating its 20,000 Amazon EC2 Reserved and On-Demand Instances to see if enough could be shifted to Spot Instances to solve the budget challenge.
Minimizing the Impact of Interruptions
One of the most crucial considerations about Spot Instances is that—because the available amount of unused Amazon EC2 compute power fluctuates—they can be interrupted. Interruptions aren't instant—there's a two-minute warning—but the possibility means Spot Instances are better suited for Amazon EC2 use cases that tolerate such interruptions well, like big data workloads, stateless web apps, high-performance computing, and batch processing.
"We needed a comfort level where we could use enough Spot Instances to achieve significant cost savings without putting any applications at risk of failing because of interruptions," says Gabriel Eisbruch, a senior manager in charge of architecture, infrastructure, and DBA teams for Mercado Libre. In the process of analyzing the company's use of Amazon EC2 instances, the team was able to first optimize the size of its fleet from 20,000 Instances to about 16,000. After further analysis, the team found that about half of these Amazon EC2 instances were good candidates for shifting to Spot Instances.
"We decided to use Spot Instances for all our stateless applications and everything else that wasn't a database," says David Pedreira, a senior manager for infrastructure at Mercado Libre. "At first, we limited Spot Instances to 30 percent of an application's servers, leaving about 70 percent on Reserved Instances. That way, we could be absolutely certain that—even if every Spot Instance were interrupted—the application wouldn't die."
After the first few weeks, Mercado Libre had experienced so few interruptions—and found no performance differences between Spot and Reserved or On-Demand Instances—that Eisbruch upped the volume of Spot Instances per application to as high as 50 percent.
Saving 31% a Month with Amazon EC2 Spot Instances
As the company’s rapid growth continued, Mercado Libre found that it could almost immediately cut its overall infrastructure costs by about 20 percent simply by replacing many of its Amazon EC2 On-Demand Instances with Spot Instances and by taking advantage of opportunities to use Spot Instances where it might once have added new Reserved Instances. To further maximize savings and minimize failure risk, the company built an application to automate decisions about when and where to use Spot Instances and which instance families, types, sizes, models, and AWS Regions and Availability Zones would deliver the best results.
The Mercado Libre application includes a data warehouse built with Amazon Athena that makes use of data from AWS billing. "Our platform pulls price information, usage statistics, and app details to decide how many and what kind of Spot and Reserved Instances to use," says Eisbruch. “The whole process is invisible to developers, so they can keep concentrating on products."
Although the Mercado Libre platform considers app criticality before deciding the number of Spot Instances to assign, the company does use Spot Instances with mission-critical apps. "We might assign a different proportion of Spot Instances for a critical app, but there's usually a sensible way to use at least some," says Pedreira. "Even with the most critical apps—like our search service and traffic layer—we've found workloads where we can use Spot Instances without posing any interruption-related risks to the overall app."
The process of moving to Spot Instances was fast and easy.
"Within just the first week, we shifted 30 percent of our Reserved Instances to Spot Instances," says Eisbruch. "In the first month, we saved 20 percent by moving to Amazon EC2 Spot Instances. Since then, the cost of our Amazon EC2 fleet has been 31 percent lower than before the change."
Although the prospect of shifting to Spot Instances might seem daunting to some companies, Pedreira says, "It sounds scarier than it is. And shifting to even a large number of Amazon EC2 Spot Instances is a very straightforward process for anyone following good cloud architecture practices. As always with AWS, there is great documentation and support, so you don't have to figure it out alone."