“Salesforce DMP, part of Salesforce Marketing Cloud, has acquired and scaled a global customer base without any concern about the limitations on the amount of data it can process. This is true even as our platform captures data from interactions from about three billion devices and about 40 billion page views each month. AWS has provided the tools to make this happen.”  
Roopak Gupta Vice President, Software Engineering, Salesforce DMP

Salesforce DMP, formerly Krux, is the leading data management platform (DMP) that unifies, segments and activates audiences to increase engagement with users, prospects and customers. With Salesforce DMP, marketers can deliver more relevant and valuable customer experiences by capturing, unifying and activating data signatures across every device (desktop, mobile, tablet and set-top) and every channel (display, social, search and video) in real time. Monthly, Salesforce DMP interacts with more than three billion browsers and devices, supports more than 200 billion data collection events, processes more than three billion CRM records and orchestrates more than 200 billion personalized consumer experiences.  

Salesforce DMP's infrastructure is continuously live across each customers' digital footprint, including websites, devices, apps and campaigns, and Salesforce DMP is embedded within every digital interaction between its clients and their consumers. Fast performance and unlimited scaling capacity are crucial to deliver both real-time personalized experiences to consumers and data-driven insights to clients.

Salesforce DMP collects, stores and makes every piece of audience data continuously available to its clients, managing more than 10 petabytes of on-demand data. Observing behavior comprehensively across media, not just campaigns, reduces bias and provides clients with greater accuracy and more sophisticated segmentation for targeting and analytics. Salesforce DMP does not require its clients to use a pre-determined audience taxonomy; instead, it enables them to independently change their audience taxonomy as needed so they can quickly adapt to changing business needs.

This approach, however, puts enormous demands on the Salesforce DMP system to quickly and efficiently process massive quantities of data. Salesforce DMP needed tools to ensure it could deliver high return on investment to its clients while continuously developing and bringing new platform features to market.

Salesforce DMP turned to AWS to manage data processing requirements that cover multiple modalities—including near-real time, on-demand, and batch mode—and to perform on-demand analysis of petabytes of data. Salesforce DMP uses a combination of Apache Hadoop on Amazon Elastic MapReduce, (Amazon EMR) and Apache Spark to run machine learning jobs and extract/transform/load (ETL) workloads, with Amazon Simple Storage Service (Amazon S3) as its core distributed storage system. Salesforce DMP leverages the Amazon EMR infrastructure using Amazon EC2 Spot instances to gain access to compute functionality at reduced costs.

Salesforce DMP uses the AWS Data Pipeline, a service that moves data between different AWS compute and storage services, to schedule Apache Hadoop and Apache Spark jobs. Salesforce DMP uses Amazon EMR to manage clusters at any time to support its on-demand batch and data-science data processing framework. It also uses Amazon DynamoDB to store user-segment membership data, which is available on different devices and applications globally.

The AWS services help Salesforce DMP scale and deliver a highly flexible data-processing service to a global customer base. By using Apache Spark with Amazon EMR and AWS Data Pipeline, Salesforce DMP is able to increase the processing speed for its data science and ETL work by 100 percent.

Roopak Gupta, vice president, software engineering, says that by using AWS services, Salesforce DMP has seen an increase in the performance of its iterative jobs, making its systems faster and more efficient. Using Amazon EC2 Spot instances helps the company save up to 40 percent over Amazon EC2 Reserved-instance pricing and up to 80 percent over on-demand pricing.

“With AWS, we can manage flexible capacity changes, contain overall costs on daily compute tasks, and manage overall infrastructure growth,” Gupta says. “At the same time, our clients have complete freedom to change their minds and evolve their segmentation strategies without limits, which would not be possible with traditional, non-cloud-based infrastructure environments. This additional flexibility enables our clients to experiment and innovate using our platform to ensure their business results.”

Learn more about running Big Data applications on AWS