A Travel Tech CEO’s journey from cloud skeptic to evangelist
As CEO of 3Victors, a travel big data AI startup, who spent the better part of his career solving tricky digital problems by tuning software to squeeze maximum performance from racks of bare metal computers, I was more than a bit skeptical of cloud computing.
First, let me provide a little insight into my company and why maximum performance of data analytics is so critical. 3Victors ingests over 1 billion worldwide travel searches and 200 billion itineraries in real-time DAILY, from multiple data sources including the world’s largest reservation systems. We provide data analytic solutions with insights into market trends, pricing, and lowest fares. Whether you need real-time access, hourly, daily, or a weekly summary, we have a solution to meet your needs. Simplifying big data helps increase engagement, conversion, bookings, and optimize the acquisition and retention of travel shoppers for our clients.
In hindsight, my cloud computing skepticism was less about the hubris of visualizing the incremental accomplishments of development teams over decades, distilled down to puffy billows on architectural renderings. Rather, it was the boots on the ground reality and abject fear regarding the time, effort and cost to transition to the cloud, and likely far worse, the fear of not doing so.
The handwriting was on the wall. Sitting on my perfectly kempt desk (photo not included) was a major renewal of our co-location facility agreement (10 months hence) and a stack of computer lease paper that was nearing the dreaded $1 “end of life” and ultimately the last straw, a 1 inch thick hardware inventory list with a property tax coupon affixed — due upon receipt.
Testing the waters
First, I asked my co-founder and CTO, an even smarter carbon copy skeptic to pull together a rough estimate on what it’ll take to migrate our technology, a hefty sized commercially licensed Hadoop cluster with all the appropriate infrastructure and analytics trimmings and its 700 terabyte data lakes. Tick-tock, the swag came back a week later — an order of magnitude higher than current costs!
Eyes rolling, I knew from past experience what our team was actually saying – “please, oh please, don’t make us do this” – and as you might imagine it had the opposite effect, a cloud deep dive was in the offing.
Could the cloud headline bullet points really be true? More time spent on building, infinite capacity, cost saving elasticity/storage options, and 90% savings on heavy analytical, ETL and machine learning workloads. After a thoroughly unscientific review of the major cloud vendors, we decided to prototype with AWS, ostensibly due to its travel and hospitality industry experience and a positive cursory glance over its pudgy list of services, particularly the ones claiming no fuss management.
Beginning the journey – how do we scale?
So the journey began, the maiden voyage headlining dozens of experiments, leading off with a complete flyer. How about just skipping over all the instance/server mumbo jumbo and shooting for the brass ring — serverless? Even more fun, let’s start with something simple — the streaming capture of 10,000 messages per second (our then current production peak volume).
Prototyped in a few days (no bull) we quickly learned Cloud Lesson #1 – the darn thing really does scale but the costs can be eye-popping when the service charges don’t match the requirements of the use case. In this case, the combination of Amazon API Gateway, AWS Lambda, and Amazon Kinesis for the high velocity capture workload actually cost more than the original “outlandish” estimate (think $60k/mo for 30 billion API requests a month, gulp). That led us to Cloud Lesson #2 – there are more than a few cunning ways to skin a cat, especially given such a rich suite of tools, that when combined in different ways can radically change the cost/performance equation. Luckily, the follow-on experiment with AWS Elastic Beanstalk/Kinesis was spot on (pun intended).
Cloud Lesson #3 revealed itself during these initial experiments – nothing is perfect – in this case Amazon Kinesis didn’t have a built-in auto sharding option, specifically one that auto-magically handles scale in and out (elasticity) matching variations in the capture load. Further, the by-the-byte billing included an arbitrary record roundup increment (5k at the time) forcing the addition of micro batching logic into the processing pipeline to prevent small record round up cost bloat.
Embracing the community. Supercharging experiments.
Sorting out these imperfections lead us to Cloud Lesson #4. The AWS builder community of vibrant product managers and technical resources are accessible if you are willing to poke, prod, cajole, and most importantly, actively participate. Case in point – the Amazon Kinesis team subsequently relaxed the round up billing mechanism and I have a vivid recollection of a lively several days running conversation between our CTO and one of the AWS developers working on the (new at the time) Java asynchronous Kinesis SDK.
With the lessons learned on these initial capture experiments under our belts, the next step was to figure out how and where to permanently store our ever-growing time series data. In the Hadoop world, you are drilled from infancy to keep your data “next to the CPU” (i.e. off the “slow” network), oops! EMR (AWS managed Hadoop) states this technique is an “old school” anti-pattern. In fact the polar opposite, namely de-coupling compute and storage (i.e. on the “slow” network). Even more mind bending was the capability of spinning up transient clusters in lieu of the scheduling nightmare we were living through in our monolithic cluster. The icing on the cake was the ability to use excess compute capacity at a heavy discount.
The pace of experiments accelerated, Amazon S3 or HDFS nay both, transient cluster spin up times, spot task instances, EMRFS S3 read performance, optimal file size for ETL, path layout/formats for schema-on-read and cost optimized storage class lifecycle transitions, oh my. Fostering another axiom, Cloud Lesson #5 – embrace new paradigms, even when they go back to the future. Turns out it was going to take us months to copy our on-premises data over what I thought was a fairly beefy Internet connection to the cloud — clearly not an option. The solution? Don’t giggle, was to snail mail the data on a set of contraptions named Amazon Snowball via UPS ground.
Learning as we go
At this point, the creative juices of the development team were flowing. One tricky problem was peeling the data off the durable stream buffer and storing 2 GB files (objects) in a popular big data format for downstream processing. This puzzling problem stimulated Cloud Lesson #6 – think out the box, nuggets abound in the wealth of offered services. Case in point, the team proposed a technique intended for partial video uploads (not nested big data) to solve the issue. A quick cross check with the API AWS tech team yielded a shoulder shrug (“never seen anyone try that before”). Fast forward to today with over 4 PB and growing by 12 TB a day continually using this clever technique.
As you might expect, cloud or not, big data implies hefty storage cost. Previously we had to make tough decisions on what to delete in order to effectively manage budget constrained storage. The cloud certainly removed the capacity constraints, but storage options and their associated limitations ranging (at the time) from $4k to $23k a petabyte month were going to make or break the viability of this transition.
Staying on top of things
A lingering question hung in the air. Was Moore’s Law going to apply to our cost equation? Cloud Lesson #7 – sign an NDA with AWS and stay on top of the service roadmaps. In our case, we modeled into our budget a change in storage class from Amazon Glacier to Glacier Deep Archive (75% savings). Further, on day one, when Amazon S3 Intelligent Tiering was rolled out, all our buckets were defaulted to this new pseudo storage class. Saving time (not having to deal with complicated transition policy plumbing) and the hassle of dealing with the inevitable cost overages as data scientists leave terabyte cookie crumbs strewn about as they tinker with big data. Cloud Lesson #8 – embrace the chaos.
I could go on about dealing with the torrid pace of cloud technology change especially as the competition heats up and each cloud vendor tries to one-up the other. Suffice it to say there is no apparent option other than to incorporate this craziness into the organizational DNA. Speaking of organizations, one major flaw in our experimentation was Cloud Lesson #9 – spend the time and effort to get production, pre-prod, development, and vendor cloud plumbing sorted with the appropriate security blanket. It’s not sexy, but oh so important.
Alas, I will fast forward over MySQL to Amazon Aurora transition (less than 20 lines of code change), hundreds of thousands of lines of Java ETL code transitioned with nary a change, real-time fanout, schema on read analytics, productionized columnar/graph database prototypes and machine learning pipelines all bundled into a relatively young DaaS (data as a service) business model. For those interested, you can quickly peruse the recently published 3Victors AWS reference architecture.
A transition complete. A new chapter begins.
On November 1st, 2018, after 4 months running in parallel and comparing endless samples of data for accuracy (a day that will live down in 3Victors infamy), 3V became a 100% cloud-based company. I am still trying to dab the metaphorical tear from the corner of my eye (after all, I am the CEO of growing startup with an additional 2-year product roadmap to deliver).
Most journeys take unexpected twists and turns, and this was no exception. As our new data clients began to onboard, we learned about their journeys. While swapping war stories about our recent cloud-fostered gains in productivity, cost, and turnaround times, several asked if we could temporarily host a variety of use cases and help them refine their digital transformation strategies. Our business model revolves around monthly data and insight subscriptions and clearly it was in our best interest to make sure our downstream client use cases optimally integrated with our products. So we added strategic consulting to our product offering and recently joined the AWS Partner Network to facilitate this unintended twist.
In summary, what started out as a healthy dose of skepticism around cloud computing – Cloud Lesson #10 – turned, at least this tech curmudgeon, into a full-throated cloud advocate. To be continued…
Learn more about how AWS is helping transform the travel and hospitality industry at aws.com/travel