Customer Stories / Software and Internet / Americas
Saving Millions without Sacrificing Performance of 100+ PB Data Lake Using Amazon S3 Intelligent-Tiering with Salesforce
Learn how Salesforce, a cloud software company, reduced costs without sacrificing the performance of its 100+ PB data lake using Amazon S3 Intelligent-Tiering.
Cloud-based customer relationship management software company Salesforce processes and stores massive amounts of customer data every day. Over 100 internal teams and more than 1,000 internal users rely on Unified Intelligence Platform (UIP), Salesforce’s over 100 PB internal data lake. UIP stores logs from multiple applications to perform analytics. Scalability is essential for UIP to support a high volume of users and a wide variety of use cases.
Facing scalability concerns and high retrieval times for the on-premises infrastructure of the data lake, Salesforce launched its UIP team to migrate to a cloud solution. Salesforce chose to use Amazon Web Services (AWS) to reduce maintenance and make it simpler to increase capacity when needed. Using services like Amazon Simple Storage Service (Amazon S3), object storage built to retrieve any amount of data from anywhere, Salesforce saves millions of dollars annually while increasing the performance, efficiency, and scalability of its data lake.
Opportunity | Using Amazon S3 Intelligent-Tiering and Amazon EMR to Reduce Costs and Maintenance of a Large Data Lake for Salesforce
Salesforce offers businesses cloud-based software so that all teams—from sales and marketing to IT—can share a single view of customer data and better connect to their customers.
Previously, Salesforce used on-premises infrastructure for UIP, ingesting over 100 TB of data per day, representing over 1 trillion events. Staff often needed to spend time resolving local machine disk failures and maintaining the physical hardware. They struggled to accommodate requests to increase capacity due to the additional hardware required.
In 2020, the UIP team began migrating the data lake from an on-premises infrastructure to AWS, becoming one of the over 700,000 data lakes that is built on Amazon S3. With an Amazon S3 data lake, users in Salesforce’s organization can discover, access, and analyze all their data, regardless of where it lives, in a secure and governed way.
For compute, the UIP relies on Amazon EMR, a cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks. To shorten the migration timeline, Salesforce used the AWS Migration Acceleration Program (AWS MAP), a comprehensive cloud migration program that uses outcome-driven methodology developed by migrating thousands of enterprise customers to the cloud.
Shortly after completing the migration, the UIP team began enhancing the infrastructure. To reduce Amazon S3 costs while maintaining optimal performance, the UIP team started to use the Amazon S3 Intelligent-Tiering (S3 Intelligent-Tiering) storage class, which delivers automatic storage cost savings by moving data to the most cost-effective access tier when access patterns change. Throughout the migration, the team worked alongside AWS representatives to optimize the S3 Intelligent-Tiering storage class to fit Salesforce’s needs. “AWS has supported us from the beginning and is always there when we face a challenge. The AWS team has offered continuous support over the years,” says Eric Legault, principal engineer at Salesforce.
When we save on costs using technology like Amazon S3 Intelligent-Tiering, we can invest money into added-value solutions such as Amazon SageMaker projects and make a bigger impact in the business.”
Principal Engineer, Salesforce
Solution | Saving Millions of Dollars Annually While Increasing Elasticity Using Amazon S3 Intelligent-Tiering and Amazon EMR
Using S3 Intelligent-Tiering automatic access tiers, frequently accessed data stays in the Frequent Access tier, data that hasn’t been accessed in the last 30 days moves to the Infrequent Access tier, and data that hasn’t been accessed in the last 90 days moves to the Archive Instant Access tier. About 50 percent of data from the data lake is stored in the Archive Instant Access tier, a very-low-cost access tier that offers significant cost savings without compromising on performance. All three access tiers have the same low-latency and high-throughput performance. If Salesforce’s users need to retrieve data from the last several years, there is no retrieval fee, and the data is available right away. “We’re saving millions of dollars per year by using S3 Intelligent-Tiering,” says Legault.
By migrating the infrastructure of the data lake to the cloud, Salesforce lowered its operational overhead, further saving on costs and reducing maintenance for staff. “We can better control our costs for data storage and the associated maintenance when we delegate to AWS,” says Legault. “The amount of work we need to do is a small percentage of what it used to be when everything was our responsibility on premises.”
Salesforce’s data lake has experienced significant growth, scaling as it accumulates internal logs from multiple applications. This centralized repository empowers users to harness the vast potential of this data, extracting profound insights and intelligence. The migration to AWS has notably enhanced both the performance and elasticity of the data lake for a better user experience. UIP uses up to 30 instance types on Amazon Elastic Compute Cloud (Amazon EC2), which provides secure and resizable compute capacity for virtually any workload. To address changes in performance needs, UIP uses the managed scaling feature for Amazon EMR, which automatically increases or decreases the number of instances or units in a cluster based on workload. Amazon S3 provides elasticity at the storage level, scaling to accommodate growing amounts of data. “With our infrastructure built on AWS, it’s simple to scale up and down,” says Legault. “We can increase the storage we use or decrease the storage if data is no longer needed.”
For users, the data lake is more reliable, efficient, and scalable using a cloud infrastructure. Because the UIP team spends less time managing the infrastructure, it can also spend more time adding tools to enhance the customer experience. For example, the UIP team added tools for users to ingest data using predefined patterns so that onboarding is more efficient. Users also have access to Amazon SageMaker—fully managed infrastructure, tools, and workflows for building, training, and deploying machine learning models for any use case—to build machine learning models on top of the data lake for advanced analytics. “When we save on costs using technology like S3 Intelligent-Tiering,” says Legault, “we can invest money into added-value solutions such as Amazon SageMaker projects and make a bigger impact in the business.”
Outcome | Expanding the Use of Amazon S3 Intelligent-Tiering to More Business Units at Salesforce
The UIP team was the first to start using S3 Intelligent-Tiering at Salesforce. Because of the success and significant cost savings realized by the UIP team, Salesforce is starting to roll out S3 Intelligent-Tiering to other business units across the company. “By migrating the infrastructure of our data lake to AWS, our users can expand their horizon, explore new technology, and process more data to potentially see patterns they couldn’t before,” says Legault. “Using AWS, we can process data more efficiently because we can scale up and down. We can provision capacity a lot more efficiently than we could on premises, where we needed to maintain hardware whether we used it or not.”
Salesforce offers businesses cloud-based software so that all teams can share a single view of data. Its Unified Intelligence Platform team manages a 100-plus PB internal data lake that stores logs from multiple applications.
AWS Services Used
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.
Amazon S3 Intelligent-Tiering
Amazon S3 Intelligent-Tiering is the only cloud storage class that delivers automatic storage cost savings when data access patterns change, without performance impact or operational overhead.
Amazon EMR is the industry-leading cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto.
The AWS Migration Acceleration Program (MAP) is a comprehensive and proven cloud migration program based upon AWS’s experience migrating thousands of enterprise customers to the cloud.
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.