Krux has acquired and scaled a global customer base without any concern about the limitations on the amount of data it can process. This is true even as our platform captures data from interactions from about three billion devices and about 40 billion page views each month. AWS has provided the tools to make this happen.
Roopak Gupta Vice President of Applications & Data

Krux helps companies worldwide deliver more valuable and personalized marketing, media, and commerce experiences. Krux’s cloud-based data management platform operates in real time , unifying people data—an industry term for consumers and their data characteristics—from all screens and sources into a single view of the individual. It helps companies analyze the data to understand each individual's preferences and activate the data across any delivery channel. Organizations such as Time Warner, Kellogg’s, BBC, Le Figaro, Axel Springer, and Ticketmaster use the Krux platform. Krux, which was founded in 2010 and is based in San Francisco with multiple global offices, interacts with about three billion devices worldwide, serves more than 40 billion page views and 150 billion ad impressions per month, and processes two billion customer relationship management records per month.

Krux’s infrastructure is continuously live across each client’s digital footprint, including websites, devices, apps and campaigns, and Krux is embedded within every digital interaction between its clients and their consumers. Fast performance and unlimited scaling capacity are crucial to deliver both real-time personalized experiences to consumers and data-driven insights to clients.

Krux collects, stores, and makes every piece of audience data continuously available to its clients, managing more than 10 petabytes of on-demand data. Observing behavior comprehensively across media, not just campaigns, reduces bias and provides clients with greater accuracy and more sophisticated segmentation for targeting and analytics. Krux does not require its clients to use a pre-determined audience taxonomy; instead, it enables them to independently change their audience taxonomy as needed so they can quickly adapt to changing business needs.

This approach, however, puts enormous demands on the Krux system to quickly and efficiently process massive quantities of data. Krux needed tools to ensure it could deliver high return on investment to its clients while continuously developing and bringing new platform features to market.

Krux turned to AWS to manage data processing requirements that cover multiple modalities—including near-real time, on-demand, and batch mode—and to perform on-demand analysis of petabytes of data. Krux uses a combination of Apache Hadoop on Amazon Elastic MapReduce, (Amazon EMR) and Apache Spark to run machine learning jobs and extract/transform/load (ETL) workloads, with Amazon Simple Storage Service (Amazon S3) as its core distributed storage system. Krux implemented the Amazon EMR infrastructure using Amazon EC2 Spot instances to gain access to compute functionality at reduced costs, using an internal framework to determine Spot bid prices.

krux-arch-diag

Krux uses the AWS Data Pipeline, a service that moves data between different AWS compute and storage services, to schedule Apache Hadoop and Apache Spark jobs. Krux uses Amazon EMR to manage clusters at any time to support its on-demand batch and data-science data processing framework. It also uses Amazon DynamoDB to store user-segment membership data, which is available on different devices and applications globally.

The AWS services help Krux scale and deliver a highly flexible data-processing service to a global customer base. By using Apache Spark with Amazon EMR and AWS Data Pipeline, Krux has been able to increase the processing speed for its data science and ETL work by 100 percent.

Roopak Gupta, vice president of applications and data, says that by using AWS services, Krux has been able to increase the performance of its iterative jobs, making its systems faster and more efficient. Using Amazon EC2 Spot instances helps the company save up to 40 percent over Amazon EC2 Reserved-instance pricing and up to 80 percent over on-demand pricing.

“With AWS, we can manage flexible capacity changes, contain overall costs on daily compute tasks, and manage overall infrastructure growth,” Gupta says. “At the same time, our clients have complete freedom to change their minds and evolve their segmentation strategies without limits, which would not be possible with traditional, non-cloud-based infrastructure environments. This additional flexibility enables our clients to experiment and innovate using our platform to ensure their business results.”

Learn more about running Big Data applications on AWS.