AdRoll is a global leader in retargeting with more than 10,000 active advertisers across over 100 countries. The company provides cross-platform reach across large display inventory sources and tools that personalize ad campaigns based on a visitor’s browsing behavior. Founded in 2007 and based in San Francisco, Calif., the company handles ads for brands including Salesforce, Tableau, Alex and Ani, Rickshaw Bags, and Hipmunk.
Ad retargeting is about converting site visitors to customers. Retargeting is a revenue driver for online businesses worldwide, and AdRoll is one of the industry leaders, growing 15,000 percent in 2012. But to effectively serve up ads, AdRoll needs the flexibility to add capacity at a moment’s notice, rapid-fire response times to win bids in real time, and the automation to ensure that the system can respond to bids quickly.
“We need high performance, but we need more than that,” says Valentino Volonghi, CTO. “We need flexibility, and we need software that could scale across multiple data centers and machines, software we could optimize as we go. Moving our operations to the cloud was really our only option.”
In rolling out its real-time bidding infrastructure, AdRoll needed to sync data for every user across four regions, on the order of hundreds of millions of users and tens of thousands of writes per second. Not only must the company deal with the daunting task of writing this data in real time, the bidding system has a hard cap of 100 milliseconds for every bid request, so AdRoll needs strong guarantees on read performance.
AdRoll started out with Amazon Simple Storage Service (Amazon S3). Getting the AWS environment up and running took about two weeks, and AdRoll is now storing 1.5 PB of data in Amazon S3. Before long, AdRoll realized that AWS could be helpful for more than just storage, so the company began moving more of its systems to the AWS Cloud. Now, the core of AdRoll’s site runs on 30 Amazon Elastic Compute Cloud (Amazon EC2) instances. Additional instances—anywhere from 200 to 1,000 of them, including Amazon EC2 Spot Instances—are used for variable capacity. “Automation is key in this business,” Volonghi says. “If any one of those instances were to fail, they would replace themselves and keep running without any human intervention.”
Beyond storage and compute solutions, AdRoll also needed a high-performance database solution to meet their 100 milliseconds latency requirement. After evaluating multiple alternatives, the company decided on DynamoDB for its low latency, guaranteed throughput, and ability to scale quickly.
DynamoDB is a NoSQL database service with guaranteed throughput and single-digit millisecond latency. As a fully managed service, DynamoDB provides automatic three-way replication and seamless throughput and storage scaling via API and an easy-to-use management console.
DynamoDB tables are comprised of a primary key (hash, or hash and range), and attributes. The schemaless design means that each data item may have a different number of attributes. Multiple data types (strings, numbers, binary data, and sets) add richness to the data model.
AdRoll tables were designed to use the cookie as the hash key and profile id as the range key with timestamp as the attribute.
|Hash Key||Range Key||Attribute|
|Cookie (User ID)||Profile||Timestamp
AdRoll uses hash-and-range primary keys for all their tables. “Hash-and-range keys allow us to use a single API, BatchWriteItem, to modify multiple items belonging to the same or different hash keys,” Volonghi says. “They also allow us to query the data very efficiently, by condensing the results of read operations into the smallest possible payload. This saves on both storage and throughput costs.”
To get the most of DynamoDB, AdRoll developed its own DynamoDB client. “We’ve been using it across hundreds of machines to quickly query DynamoDB with a consistently low latency across our Erlang infrastructure,” Volonghi says. “We just write to it, measure the write throughput and read throughput, and get the benefits without having to dive into the details.” The AdRoll team recommends setting alerts on write throughput for both low and high levels to understand when capacity is running low or the system is down.
By using Amazon DynamoDB in conjunction with Apache Storm, AdRoll can replicate its data set across the globe in under 50 milliseconds, providing speedy response times for both bidding and serving up ads to customers—while keeping costs low.
AdRoll also benefits from the scalability provided by AWS. “AWS provides us with the capability to handle traffic that comes from Facebook, Google, Yahoo, and other heavily traveled sites—so that we can serve up more than 50 billion impressions a day,” Volonghi says. “It’s cost-effective, too—we spend more on snacks than we do on Amazon DynamoDB.”
Using AWS has made it easy for AdRoll to onboard new customers. When a new customer comes on board, AdRoll’s machines need to be able to handle all additional traffic instantly. Usually, in traditional on-premise infrastructures, onboarding a new customer means going through a process of approvals to get new machines and add them to the Hadoop cluster, procure more storage, etc. It can take up to 90 days. “With AWS, we don’t have to worry about any of that,” Volonghi says. “If we’re close to capacity, we just Auto Scale a few new instances and we’re done.”
The company can quickly build business by joining new exchanges, no matter where they are physically located. “AWS has regions close to all the worldwide traffic exchanges, so when a new exchange comes on board, we can take advantage of it immediately,” Volonghi says. “It’s as simple as flipping a switch and opening a new data center where our machines can get traffic. After that, we can start bidding. Easy."
Volonghi credits AWS with providing the scalability and capacity on demand that AdRoll needed to build its business. “When our business was growing really fast, using AWS allowed us to scale and optimize our algorithms—and get rid of extra capacity. AWS saves us time and money. We don’t need a bigger data center, we don’t need to get more operations people on board, and we don’t need to acquire more machines just because we have to scale up.”
To learn more about DynamoDB, visit our Amazon DynamoDB details page: http://aws.amazon.com/dynamodb/.