Krux helps companies worldwide deliver more valuable and personalized marketing, media, and commerce experiences. Krux’s cloud-based data management platform operates in
Krux’s infrastructure is continuously live across each client’s digital footprint, including websites, devices, apps and campaigns, and Krux is embedded within every digital interaction between its clients and their consumers. Fast performance and unlimited scaling capacity are crucial to
Krux collects, stores, and makes every piece of audience data continuously available to its clients, managing more than 10 petabytes of on-demand data. Observing behavior comprehensively across media, not just campaigns, reduces bias and provides clients with greater accuracy and more sophisticated segmentation for targeting and analytics. Krux does not require its clients to use a pre-determined audience taxonomy; instead, it enables them to independently change their audience taxonomy as needed so they can quickly adapt to changing business needs.
This approach, however, puts enormous demands on the Krux system to quickly and efficiently process massive quantities of data. Krux needed tools to ensure it could deliver
Krux turned to AWS to manage data processing requirements that cover multiple modalities—including near-real time, on-demand, and batch mode—and to perform on-demand analysis of petabytes of data. Krux uses a combination of Apache Hadoop on Amazon Elastic MapReduce, (Amazon EMR) and Apache Spark to run machine learning jobs and extract/transform/load (ETL) workloads, with Amazon Simple Storage Service (Amazon S3) as its core distributed storage system. Krux implemented the Amazon EMR infrastructure using Amazon EC2 Spot instances to gain access to
Krux uses the AWS Data Pipeline, a service that moves data between different
The AWS services help Krux scale and deliver a highly flexible data-processing service to a global customer base. By using Apache Spark with Amazon EMR and AWS Data Pipeline, Krux has been able to increase the processing speed for its data science and ETL work by 100 percent.
Roopak Gupta, vice president of applications and data, says that by using AWS services, Krux has been able to increase the performance of its iterative jobs, making its systems faster and more efficient. Using Amazon EC2 Spot instances helps the company save up to 40 percent over Amazon EC2
“With AWS, we can manage flexible capacity changes, contain overall costs on daily
Learn more about running Big Data applications on AWS.