The Unified Intelligence Platform Team (UIP) of Salesforce manages a petabyte-level data lake, and it was looking to innovate the analysis and processing of data, with an eye toward cost savings and greater efficiency. Using Amazon Web Services (AWS) for a mix of instance-provisioning models from Amazon Elastic Compute Cloud (Amazon EC2), which provides secure and resizable compute capacity for virtually any workload, the UIP Team was able to build a scalable, elastic compute infrastructure. Its remodeled compute infrastructure takes less time to process twice as much data while saving the company over $1 million monthly.
Salesforce uses AWS for its data workflows, while AWS uses Salesforce for its customer relationship management. This partnership makes it simpler for developers using both technologies to build and launch customer applications, use AWS services natively within Salesforce, and securely connect data and workflows across both Salesforce and AWS.
Opportunity | Architecting for Efficiency on AWS
In late 2019, the UIP team began to transition its on-premises cluster to Amazon EMR, a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Trino. The use of Amazon EMR helps Salesforce to reduce the complexity of managing its big data frameworks, and it provides native integration for Amazon EC2 Spot Instances, which help companies take advantage of unused Amazon EC2 capacity in the cloud.
The team identified its workloads running on Apache Spark as good candidates for the use of Spot Instances as part of a cost-optimization strategy. Apache Spark on Amazon EMR features automatic retries, which help provide resiliency in the event of Spot Instance reclaims due to Amazon EC2 capacity requirements. Also, the UIP Team uses Spark Streaming to process data in 5-minute windows, another factor that helps minimize the chance of having to do retries.
To find the greatest availability at the lowest price, Salesforce UIP experimented with using Spot Instances alongside other Amazon EC2 purchase options. It ran scenarios at various percentages of Spot Instances and Amazon EC2 On-Demand Instances, for which companies pay for compute capacity by the second with no long-term commitment and have full control over the instance’s lifecycle. For its On-Demand Instances, Salesforce UIP was using Savings Plans, a flexible pricing model offering lower prices compared to On-Demand pricing, in exchange for a specific usage commitment. Seeking to balance the reliability of the cluster with discounts on instances, the UIP team identified an optimal configuration with 60 percent of its Amazon EC2 usage benefiting from Spot Instances and the remainder using Savings Plans coverage. To meet its service level agreements (SLAs), Salesforce UIP created its fleets with performance in mind. It splits workloads into SLA and non-SLA clusters depending on the processor characteristics and the ability to meet SLA time requirements. “In terms of building our fleet, I think the real benefit here is how many instance types you can use,” says Eric Legault, principal engineer at Salesforce. “Using AWS helped us to play with many different configurations of machines and try different scenarios. It’s just a matter of changing the configuration and a couple of hours later we could see if it actually worked or not. So I think that was a huge part of making this a success.”
Amazon EMR Managed Scaling plays a big part in our ability to use the elastic capability of the cloud. And we significantly reduce costs just by using Spot Instances in an innovative way.”
Principal Engineer, Salesforce
Solution | Saving Millions Using Spot Instances
In alignment with AWS best practices for the use of Spot Instances, the team built additional stability into its fleet by moving from two or three instance types to 27 instance types while maintaining the same capacity. The UIP team runs 12 clusters specialized for different purposes, using instance types optimized for CPU, memory, or balanced workloads and featuring a mix of Intel and AMD processors. “The more instance types you have, the more resilient your cluster,” says Legault.
Salesforce UIP is now processing 200–250 TB a day and writing about double that amount of data for about the same cost previously required to process 100 TB. By rearchitecting the ingestion processing to a streaming approach and using Spot instances to increase infrastructure while controlling costs, the team also improved processing efficiency, reducing the time to ingest and process data from 4 hours to 15 minutes. To provide greater visibility into its compute metrics, Salesforce uses Amazon CloudWatch, a monitoring and observability service of AWS resources. The team monitors applications using custom dashboards built using open-source software Grafana, which provides visualization for operational metrics.
Salesforce’s UIP solution also uses Amazon EMR Managed Scaling, which automatically resizes the cluster for best performance at the lowest possible cost. To offset the possibility of a Spot Instance reclamation, Salesforce lets its cluster scale about 5 percent over capacity. The additional capacity means faster processing to provide more availability and stability in the cluster when necessary. “I think that’s really the benefit, that we can scale higher and scale down when the capacity is not needed,” Legault says. “This process also takes care of whatever went out in terms of reclaim.” Plus, the use of a new capability within Amazon EMR Managed Scaling prevents Salesforce’s UIP from scaling down instances that store intermediate shuffle data for Apache Spark, which leads to better performance and lower cost.
Salesforce UIP’s incorporation of Spot Instances complements Savings Plan usage and lowers compute costs by more than 60 percent, saving the company over $1 million a month. In addition to two clusters that run full time, the UIP team can cost effectively scale up a third cluster when necessary to accommodate a sudden influx of data. The UIP team’s use of Spot Instances to handle upscaling also helps lower costs in other areas of Salesforce, freeing Savings Plan instances that other teams can use to reduce costs during peak periods.
Outcome | Building Intelligence into Fleet Management
As part of the optimization process, the UIP team uses Spot Instance advisor, which helps companies determine pools with the least chance of interruption and provides savings over On-Demand rates. The team hopes to build even more intelligence into dynamic fleet management using the Amazon EC2 Spot placement score, which can recommend, in near real time, an AWS Region or Availability Zone based on Salesforce’s requirements. Using the Amazon EC2 Spot placement score, the UIP team plans to find even greater capacity and lower prices as it expands across AWS Regions. “We use the capacity of the cloud and the wide range of Amazon EC2 instance types to do things we couldn’t do on premises,” Legault says. “Amazon EMR Managed Scaling plays a big part in our ability to use the elastic capability of the cloud. And we significantly reduce costs just by using Spot Instances in an innovative way.”
As the top customer relationship management (CRM) and customer engagement platform, Salesforce serves more than 150,000 companies globally. Salesforce unites sales, service, marketing, commerce, and IT teams with a single, shared view of customer information, helping to grow relationships with customers and employees alike.
AWS Services Used
Amazon Elastic Compute Cloud (EC2)
Amazon EC2 offers the broadest and deepest compute platform, with over 500 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload.
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud.
Savings Plans is a flexible pricing model offering lower prices compared to On-Demand pricing, in exchange for a specific usage commitment (measured in $/hour) for a one or three-year period.
Amazon EMR Managed Scaling
With EMR Managed Scaling you specify the minimum and maximum compute limits for your clusters and Amazon EMR automatically resizes them for best performance and resource utilization.
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.