AWS Cloud Enterprise Strategy Blog

Don’t Fly Solo on Your Migration — Use the Cloud Buddy System

Lessons From the Edmunds.com All-in Migration to AWS

The notion of a buddy system has been used for decades in many facets of life, including school, work, and adventure. Whether it’s a college freshman being paired with an upperclassman at orientation, an Air Force pilot and his or her wingman, or your weekend scuba diving partner, most buddy systems serve one of two purposes. The first is safety, usually in sports or dangerous activities where you watch each other’s backs. And the second is to provide new students or workers with training and guidance when they’re paired with a more experienced buddy in order to avoid common, first-time pitfalls and, thus, progress more quickly with confidence.

Speaking personally, having a “cloud buddy” would have eliminated much of the excessive anxiety and experimentation in 2012, when I began the all-in journey to the cloud as CIO at Edmunds.com, one of the largest car shopping websites in North America.

But, unlike today, it was difficult back then to find extensive buddy system resources from other companies that were successfully migrating. All-in reference cases at scale (besides Netflix), managed migration programs, or a mature consulting partner ecosystem would have made life much easier. Fortunately, the current abundance of people, process, and technology now focused on cloud migrations means that an organization never has to go it alone like we did at Edmunds.com; it also means that the level of expertise in accelerating cloud adoption and maximizing cost savings makes all-in strategies more viable than ever.

As a lifelong surfer, I can tell you that the notion of a buddy system when surfing is not a thing — even in dangerous conditions. Going it alone is considered the ultimate soulful pursuit. However, these days, I prefer to take a more practical approach when I’m traveling to a new part of the world to surf. I try to find a “buddy” who has been there and can tell me everything I need to know about the waves before I paddle out. How shallow is the reef? Is it sharky? What tide is best? Hearing this advice, and benefiting from this experience, reduces my anxiety (usually) and just makes for a more quality experience.

I recently joined a group of former CIOs who make up the Enterprise Strategy team at AWS. Our objective is to help technology executives think through and craft their cloud-first strategies, and one of the ways we do this is by inventing and simplifying new migration acceleration programs that take advantage of our accumulated knowledge (there’s no compression algorithm for experience). As former CIOs and AWS customers, we’ve led our own cloud migrations and helped transform businesses of all shapes and sizes in the process, and our stories are much like the tips and advice I get from my surf buddies when tackling a new wave.

In retrospect, there were three major revelations for me from the Edmunds.com migration story; and, as you’ll see, even though we shut down the last Edmunds.com data center in early 2016, the process we went through still tracks closely with the cloud stages of adoption that most enterprise migrations are experiencing today —

We’re Going to Completely Abandon This High-Performing Data Center Operation

Actually, that’s not the precise thought. As CIO at the time, my primary objective was to deliver technology capabilities that stayed ahead of business demand. For the 7 years leading up to the cloud migration, we had worked tirelessly to develop what was considered a highly efficient infrastructure operation and DevOps practices. But that efficiency came at a cost to the business, even though it provided daily automated releases and unprecedented reliability. The cost stemmed from allocating more and more of the company’s finite resources to support code (private clouds and DevOps toolsets) and not enough to customer-facing application code (new customer features and services). We needed a new paradigm to keep the support code-to-customer code ratio in check without sacrificing any capabilities.

The emerging cloud momentum in 2011/2012 offered an alternative with claims that the public cloud — and, in particular, AWS’ scale — could provide better infrastructure and higher-level services at a more competitive price point than you could achieve as an individual company. However, the actual picture was much more “cloudy,” and there was no shortage of press declaring the cloud to be more expensive and less reliable than proven on-premise installations. The early adoption of AWS by Netflix gave substantial credibility to the argument that larger, more established businesses could run critical operations in the cloud; but, at the time, we were unable to identify a more apples-to-apples reference implementation for the Edmunds.com business.

A peer reference seemed vitally important back then because there were no buddy system migration resources to speak of that would help us leverage proven cloud adoption patterns.

Lacking any of this knowledge, we built our business case in two steps that have, in time, become standard practice for any company’s cloud migration:

  1. A proof of concept project that demonstrated the viability of running our critical operations in the cloud.
  2. A realistic financial model of an all-in cloud operation that would stand the scrutiny of time and show at least cost parity (or less) with the current infrastructure spend profile.

In hindsight, deciding to stand up a full version of the core Edmunds.com website to demonstrate cloud viability wasn’t the quickest or easiest choice for a proof of concept. But, after nearly six months of trial and error by a couple of dedicated engineers, we had incontrovertible proof for even the biggest naysayers that the cloud was a real option for Edmunds.com. Today, AWS and good system integrators have developed buddy system approaches, such as landing zones, a component of the AWS Cloud Adoption Framework, to get migration business cases ramped up much quicker than the route we took.

We felt good enough about the results to take the next step with a deep dive into cloud economics. At this point, we hadn’t yet discovered the massive leaps in productivity made possible through cloud native architecture adoption, but we were confident that moving to the cloud probably wouldn’t destroy the company.

Developing the financial model seemed like an equally daunting hurdle. The model had to be real, and it had to refrain from aggressive optimization assumptions. We were already efficient and frugal, and I honestly didn’t know if we would end up with an operating expense increase or decrease. So I was a little surprised, after more than a month of analysis, that we had a conservative model demonstrating modest operating expense savings once we completed a two-year migration plan to AWS (synchronized with a primary data center lease expiration). This was in addition to the millions that would be saved each year on capital equipment expenditures. The plan was also a bit of a sandbag, because it was based on pure lift-and-shift assumptions, and we were confident that we could deliver substantially more OpEx savings during the migration, but we didn’t know how to prove it upfront.

The favorable financial model and a strong proof of concept result led to an exciting and warmly received presentation to the CEO, who promptly approved our AWS migration recommendation.

One stylistic error I’ll point out was my over-emphasis on free cash flow savings. I almost entirely ignored the favorable OpEx position, assuming that was a prerequisite for approval; instead, I chose to focus on the bigger combined cash savings figure. It turned out, though, that backing up the OpEx forecast assumptions was more important to the CEO.

Sharpening the business case messaging and the ability to forecast deeper cloud savings is now the specialized focus of the AWS Cloud Economics group. But that team, which assists customers with migration and TCO modeling using proven techniques, wasn’t fully formed at the time. Today, the Cloud Economics group at AWS offers some of the best buddy system resources early-on in a cloud journey because it has data on thousands of migrations, and these numbers help predict and quantify the savings from maximizing server utilization and workforce productivity as part of a business case.

I Think We’re Actually Going to Pull This Off

It really wasn’t that tenuous. But any project touching every single application and system carries a fair amount of risk, and there’s never any grace for delays or disruptions due to back-end enhancements. Once applications and data are being migrated, however, you begin to realize that the biggest organizational risk has nothing to do with outages or performance issues. It’s finishing. Getting stuck mid-migration not only impacts your cloud TCO model, but it can become a prolonged distraction from the company’s priorities.

As I mentioned earlier, it’s really critical to “buddy up” as early as possible in your cloud migration. Programs like the AWS Migration Acceleration Program (MAP) or tools such as AWS Database Migration Service (DMS) are both broad and targeted examples of the vast amount of buddy system resources that have been created to avoid many of the challenges we faced during the Edmunds.com migration. These resources have been developed with input and experience derived from thousands of customer migrations, and the programs and tools include an extensive list of proven migration patterns, such as moving from Oracle to Amazon’s RDS managed database service.

That said, we did learn a few valuable things as we felt our way through a solo migration. And I believe these learnings are key for any cloud-driven organization wanting to glide past the finish line with vigor instead of crawling out of a nightmare —

  1. Adjusting your initial migration principles is NOT a slippery slope to compromised architecture or failure. Your cloud migration strategy needs to have principles that are flexible enough to adapt to cloud agility as well as your newfound experience of working in the cloud. We actually started the Edmunds.com migration with a principle to only leverage core compute (EC2) and storage (S3, EBS). But this was due to a lack of familiarity with higher-level AWS services like Amazon RDS, Amazon CloudWatch and Amazon DynamoDB. We very quickly realized the integration and cost benefits of these new cloud native services, including the ability to spend more time on customer code. Today, Edmunds.com is leveraging more than three dozen AWS services.
  2. Use the two-week rule for refactoring decisions. We started our two-year migration plan with a flexible principle that allowed us to be opportunistic about refactoring; but nothing could delay the two-year target due to the data center lease expiration, so lift-and-shift often became the default. Once the migration was more fully underway, however, the team developed a specific two-week rule of thumb that it still uses today. If we could refactor a sub-optimal component or service in our stack within two weeks, we would refactor versus lift-and-shift. For example, the NFS-based shared storage architecture was high on the refactor list, but it didn’t comply with the two-week rule, so it got scheduled at the end of the migration window. On the other hand, many things — like load-balancing, caching, OS distribution, and DNS — were refactored during the migration using the new two-week rule. Depending on your migration timeline, or development cycle, you might want to use a different time period, but two weeks, or one development sprint, was the optimal constraint for Edmunds.com. A good buddy system resource here is the AWS Application Discovery Service, which systems integrators use to help companies identify and map the dependencies of applications before determining the best candidates for simple lift-and-shift or opportunistic refactoring. And now you can track your migration status in the recently released AWS Migration Hub. I’ll provide additional thinking about the two-week rule in a future post. But, in the meantime, be sure to read “6 Strategies for Migrating Applications to the Cloud” by Stephen Orban, Global Head of Enterprise Strategy at AWS. This illuminating post provides a very useful construct.
  3. You don’t need to dump your current team and hire a group of cloud all-stars. Edmunds.com didn’t hire a single employee specifically for the cloud migration, let alone a “cloud specialist.” The lesson here was establishing clear leadership and the equivalent of a Cloud Center of Excellence (CCoE) with well-defined objectives and key results. The head of our cloud migration team, Ajit Zadgaonkar, was originally hired to lead our automated testing team (SDETs). His team already had experience collaborating with the traditional Ops team on automated provisioning and Continuous Integration and Delivery. Again, Stephen Orban has written on this topic, and I recommend his post, “You Already Have the People You Need to Succeed With the Cloud.” The other important consideration here is that you are making a choice between new cloud/devops engineers who know nothing about your environment and your existing team that has years of tribal knowledge on the dependencies, flows, and business requirements of your critical applications. My AWS colleague, Jonathan Allen, has detailed the process he went through at Capital One to train and prepare his existing team for the cloud.

Getting the people and culture components right are as important as the technology decisions you make because they enable the internal buddy system that will ensure consistency across the organization as a migration accelerates and touches more applications.

I’m Glad I Didn’t Screw My Future Self

This third — and final — revelation was more of an epilogue to the migration from my new position and perspective in the organization. Shortly after we completed the move to AWS, I transitioned from COO/CIO and became Edmunds’ first Chief Digital Officer. In this role, I focused on developing a next-generation advertising platform and bringing new business models to market, such as online auto retailing and messaging applications. So I went from providing cloud services to consuming them. And, looking back, I was definitely a demanding customer!

After all of its applications and data had migrated to AWS on schedule, Edmunds.com was able to cut IT expenditures by 30%. The team achieved even greater savings by starting to optimize or rethink every component of the stack with a cloud native architecture (auto-scaling, microservices, ad hoc compute), or by simply replacing components outright with an AWS service. Many of the initiatives that my new teams were working on had a technology profile that looked nothing like what was initially migrated to AWS, and, in some cases, they were entirely serverless. There is already a burgeoning buddy ecosystem for serverless architectures, and it’s changing the math on comparisons between cloud and on-premise installations.

These new AWS services, like AWS Lambda, AWS Elastic Beanstalk, Amazon Kinesis and AWS GLUE, could never have been rationally developed internally by Edmunds.com, but they delivered new capabilities for customers at a previously unimaginable rate. And, looking forward, the gap in what can be accomplished in your data center versus cloud native services is only increasing. For example, the mainstreaming of machine learning and artificial intelligence requires very different technology profiles than your run-of-the-mill web application. Maintaining those distinct skill sets and specialized compute capacity in house is far from the best option for most organizations.

I’ll keep offering insights and advice like this in my future posts. The goal, of course, is to help get you on the path so you can reinvent your own technology and business.

In the meantime, try using the buddy system to start or finish your migration as quickly as possible. Then you’ll be able to reduce your support code obligations for maintaining infrastructure and deliver more customer code using cloud native services.

The biggest changes start with simple steps —

Philip Potloff
@philippotloff
potloff@amazon.com
Enterprise Strategist