Intent Media uses predictive analytics to help travel brands maximize revenue per website visitor. Machine-learning capabilities enable the company to serve ads and conversion-boosting user experiences based on a unique understanding of where a given user is in the shopping journey.
Intent Media makes travel shopping easier for consumers, increases the profit per website visitor for travel publishers, and provides a new source of highly qualified traffic for travel advertisers. To achieve that, the company processes billions of records per day on the behavior of website visitors and applies advanced data science and machine learning to predict what future visitors will want—for example, that people searching for flights to Orlando, Florida, are really interested in a trip to Disney World.
Through late 2016, Intent Media supported two workloads on a cluster of 15 HPE Vertica nodes running on Amazon Elastic Compute Cloud (Amazon EC2). The first was a traditional data warehouse that hosted aggregated data. The second workload was focused on loading and querying raw event data. The Vertica environment presented several challenges:
• High costs. Intent Media had fixed Vertica licensing for 23 terabytes (TB) and was running up against that limit, which forced engineers to make trade-offs and stood in the way of business growth. Increasing the capacity of the Vertica environment would have required an up-front investment of six to seven figures, as well as the additional operational costs of running on Amazon EC2.
• Significant administrative overhead. Even though Vertica was running in the cloud, all administration and management was performed by Intent Media. Day-to-day operation of the environment required the equivalent of two full-time employees, just to keep things up and running.
• Unplanned downtime. Intent Media experienced significant issues with its Vertica environment a few times a year, along with frequent interruptions to its reporting environment. Those outages affected both customers and internal business decision-making.
In searching for a better solution, Intent Media CTO Paul “PJ” Julius wanted an analytics environment that could scale on demand so that his team wouldn’t need to worry about capacity limits. He also wanted to pay for that capacity as needed, instead of having to make a large up-front investment. And he wanted to free his team from the daily burden of administering the analytics environment. Finally, he wanted a partner to help bring this vision to reality, so that his team wouldn’t be distracted for several months by a migration project that, while needed, wouldn’t necessarily deliver any new business functionality.
Intent Media chose Full 360, an Advanced Consulting Partner of the AWS Partner Network (APN), to migrate its data warehouse workload from Vertica to Amazon Redshift. “We run almost all of our business on AWS, so our choice of Amazon Redshift was straightforward,” says Julius. “And by working with Full 360, we reduced the complexity of the project to little more than a line item on our budget.”
Adds Will Norman, Director of Engineering at Intent Media, “We were highly impressed with Full 360’s extensive knowledge of both Vertica and Amazon Redshift. We already had a high-level project plan, which, combined with Full 360’s expertise, enabled them to get started on the work almost immediately.”
Migration from Vertica to Amazon Redshift was smooth and successful. During the project, which took about three months, Full 360 extracted all data from Vertica to Amazon Simple Storage Service (Amazon S3) and performed the initial build and load into Amazon Redshift. In parallel, Full 360 updated Intent Media’s existing extract, transform and load (ETL) pipeline to work with Amazon Redshift. Following that, Full 360 used its proprietary UpShift methodology to tune the database’s table structures for Intent Media’s workloads, and then quantified query performance to ensure that users wouldn’t be adversely affected.
“User queries are the core of UpShift,” explains Jeremy Winters, Lead Solution Architect for the Data Warehousing Practice at Full 360. “Queries from the production Vertica cluster were filtered and boiled down to an indicative test set of nearly 1,200 queries, which were then fed into a JMeter test suite to capture query timings in isolation and under concurrent load. Results were compared to the timings for the same queries in Amazon Redshift to quantify changes in performance at the query level and at the overall system level.”
Adds Norman, “With Full 360’s extensive domain knowledge, there was very little trial and error—their initial approaches proved to be right 99 percent of the time. Query performance is about the same as before, which is pretty impressive, considering we went from 15 Vertica nodes to four Amazon Redshift nodes. Nightly data loads are about three times faster since Redshift takes advantage of parallel loading.”
As part of the engagement, Full 360 also provided Intent Media with advanced training on how to tune Redshift in Amazon’s local New York office.
As illustrated in the diagram, Intent Media’s data warehouse running on Amazon Redshift is part of a larger solution running on AWS:
• The process begins with more than a billion raw log entries per day, which are ingested in near-real-time using Amazon Kinesis.
• Event data is stored in Amazon S3, which contains petabytes of such data at any given time. A single 30-day period may examine 10 terabytes.
• Nightly Apache Spark jobs, which are scheduled using Apache Airflow and run on Amazon EMR and Amazon EC2, handle the ETL process and load the data into Amazon Redshift.
• The data warehouse also consumes reference information that is stored in Amazon Relational Database Service (RDS) and synchronized to Amazon Redshift using FlyData, an Amazon Redshift Partner.
Intent Media uses Terraform by HashiCorp, an APN Technology Partner and AWS DevOps Competency Partner, to provide infrastructure-as-code capabilities similar to those provided by AWS CloudFormation. The company monitors the environment using Amazon CloudWatch and, since completing the migration, has started using Amazon Athena to directly run SQL queries against semi structured event data in Amazon S3.
With help from Full 360, Intent Media migrated its data warehouse workload from Vertica to Amazon Redshift quickly and smoothly, without the need to divert internal engineering resources from other business objectives. The company’s new analytics platform not only is more scalable and less expensive, but also has improved availability and eliminated a good deal of day-to-day administrative work. Key benefits include:
• On-demand scalability. Intent Media no longer needs to worry about the capacity of its analytics platform. Moving forward, when more headroom is needed, the company can simply use the AWS Management Console or the ModifyCluster API to increase the number of nodes within its data warehouse cluster — and have those changes be applied immediately. “We’re more agile because we no longer need to worry about the size of our data warehouse or make trade-offs to stay below its capacity,” says Julius. “Instead, we have the freedom to do whatever’s best for the business.”
• Lower costs. By moving its aggregation data to Amazon Redshift, Intent Media avoided the immediate, six- to seven-figure investment that would have been required for more capacity on Vertica. Considering existing data volumes, Julius estimates that the move to Amazon Redshift will save the company an estimated $1.5 million per year in operating expenses. Just as important, the company can pay for additional capacity as it’s needed — in proportion to future revenue growth.
• Less system administration. Because Amazon Redshift is a fully managed service, Intent Media’s engineering team will no longer need to worry about things such as monitoring, snapshots, and system upgrades and maintenance. That will enable the team to shift a large part of the roughly 40 hours a week that were required to keep Vertica up and running to more important and productive efforts, such as the delivery of new business value.
• Improved availability. Having a fully managed service has also decreased downtime, resulting in fewer interruptions to business decision-making and happier customers. For Norman, it also means fewer pages at 2 o’clock in the morning to, as he puts it, “Scramble to something that’s broken.”
While the above benefits are powerful on their own, the advantages of working with Full 360 to achieve them are also noteworthy. “We could have done the work on our own, but that wouldn’t have been as beneficial,” says Julius. “By working with Full 360, we received a guaranteed deliverable, and our internal team was able to stay focused on furthering our business objectives instead of getting dragged into the drudgery of migrating queries and so on. From a business perspective, in terms of everything from opportunity cost to employee satisfaction, the decision to use Full 360 was an easy one.”
•Full 360 is an Advanced Partner of the AWS Partner Network (APN) with three professional services practices (Big Data, Microservices, and CloudOps) that leverage years of expertise to deliver world-class, high-performance, data-centric applications.
•For more information about how Full 360 can help your company build and manage your AWS environment, see the Full 360 listing in the AWS Partner Directory.
Learn more about AWS Cloud Data Migration.