The Great Cloud Refactoring Debate

This post is the first in a multi-part series on how enterprises balance refactoring desires against time to value in cloud migrations.

With cloud adoption becoming the new normal for the enterprise, the next-level debate is now about the specific approach to take when migrating to the cloud. While it is usually not difficult to achieve consensus on a “cloud first” strategy for all new capabilities delivered, rarely do the same parties easily agree on a common approach to the hundreds or thousands of existing applications that still reside in company data centers. Teams responsible for effecting an enterprise’s transition to the cloud might initially find it easier to apply a one-size-fits-all approach and cut through devolving debates about risks and dependencies, but that approach could also jeopardize the trust and cooperation of the application owners they are asking to migrate.

Yet many enterprise cloud teams have been successful earning this trust and cooperation while also delivering on aggressive timelines in so-called “lift-and-shift” cloud migrations, where the goal is to move a specific set of applications to the cloud as fast as possible without changing their core architecture, functionality, or performance characteristics. This is no small challenge since some application development teams will have the opinion that their software will require substantial refactoring to run in the cloud, while others don’t want to bring their technical debt to a pristine new cloud environment.

The fundamental notion of software refactoring is to change application code for the better. Martin Fowler has defined refactoring as “a change made to the internal structure of the software to make it easier to understand and cheaper to modify without changing its observable behavior.”

When it comes to enterprises trying to figure out their best strategy for cloud adoption, the refactoring debate is where a lot of aspirational cloud plans get dampened.

Climbing up on the soapbox of experience for a minute, the enterprise cloud migrations that seem to most often struggle to meet expectations are those lacking a clear mandate to achieve specific, measurable outcomes for each application. The result is a casual “move when ready” strategy for legacy applications with each team responsible for some subjective level of necessary refactoring. Since most teams are already facing a demanding roadmap to deliver new product and feature requests, even with good intentions to migrate to the cloud it is often a more comfortable choice to put off migrating an application when no solid timetable exists to do so.

Climbing off the soapbox, the reality is that even enterprises that have dictated a firm lift-and-shift migration strategy need to provide allowances for certain types of cloud native changes to applications moving out of a corporate data center. These changes relate to replacing on-premises components with cloud native ones like logging, load balancing, elasticity, or just an operating system upgrade. The question becomes, how much is enough?

My former company, Edmunds.com, is one example of an enterprise that undertook a large-scale, time-compressed migration to AWS, while also providing a balanced approach to refactoring. The proverb goes “necessity is the mother of invention,” and with less than two and a half years to migrate hundreds of applications and databases, the cloud migration and engineering teams had to develop a plan that would not buckle under the pressure from teams that wanted to expend a lot of upfront effort refactoring their software. Their brilliant yet simple covenant with application owners was called the “Two-Week Rule.” It’s what we would call at Amazon a mechanism.

Mechanisms at Amazon are used to solve recurring problems, drive consistent outcomes, and help leaders guide the organization at scale. Mechanisms are complete processes that transform a set of inputs into a set of desired outputs. Our press release writing is a mechanism for internally sharing and evaluating new ideas, and the Andon Cord mechanism that has been adapted from lean manufacturing ensures customer reported defects result in the ability to halt sales of defective items and resolve the root cause.

Like many enterprises, most of the application development teams at Edmunds.com initially feared that their software would not run in the cloud effectively, or at all, without significant changes. The cloud migration teams would hear objections about incompatible storage architecture, unique caching requirements, or specialized network configurations. After working through several of these roadblocks without needing to make massive changes to legacy applications, the cloud teams began to see common patterns that could be applied to future migrations, giving themselves and their app dev partners confidence to proceed more quickly with application migrations.

Eventually, this was codified into the Two-Week Rule mechanism that afforded each migrating application a two-week development sprint working with the cloud migration teams to refactor what were considered to be the most problematic compatibility issues with the cloud. After a time, the cloud engineering team developed packages and deployment scripts to automate some of the most common changes, such as logging and DNS configurations. A good mechanism is a complete process that reinforces itself and gets better. The Edmunds.com teams were able to optimize the Two-Week Rule to the point where specific steps occurred on a specific day in the ten-business day window.

The Two-Week Rule is one of many approaches that enterprises have used as a forcing function to speed up their time to value in leveraging the cloud. Jake Burns, VP of Cloud Services at Live Nation Entertainment, used the principle of “minimum viable refactoring” to complete the migration of 90% of Live Nation’s on-premises infrastructure to the cloud in twelve months. These refactoring constraints are also meant to build muscle memory for Lean and Agile techniques that promote frequent and continuous refactoring before, during, and after migration to the cloud. In fact, Edmunds.com only completed a fraction of their overall application refactoring using the Two-Week Rule, because in the two years since they shut down their last data center, large parts of the application portfolio have been completely overhauled to leverage cloud native capabilities.

Enterprises that can develop a cultural principle around refactoring and modernization as a journey and not a destination may be less likely to feel like a cloud migration is the one shot to cure all ills.

In my next post in this series, “The Two-Week Rule: Refactoring Your Applications for the Cloud in 10 Days,” we will dive deeper into the Two-Week Rule with one of my former Edmunds.com colleagues.

Enjoy the refactoring journey.

Philip Potloff
Head of Enterprise Strategy, AWS
@philippotloff
LinkedIn

AWS Cloud Enterprise Strategy Blog

The Great Cloud Refactoring Debate

Resources

Follow