AWS Cloud Enterprise Strategy Blog

Decisions at the Margins

The annual budgeting process leads us to make decisions based on total costs. We assign a budget to a cost category, then spend the year managing to that cost. Our investment management process similarly leads us to think in terms of total costs. We plan against a set of requirements, estimate the total cost, figure out the return on that spending, build a business case around that return versus investment, and make a go/no-go decision. When we put together a set of requirements for an IT system, we don’t generally do it as a menu of individual line items with costs; we estimate the cost of the total package, then invest in it as a whole.

But the digital world—in more ways than one—is a world of making spending and production decisions at the margins. Instead of making a one-time spending or investment decision based on a total objective, we are better off making incremental decisions continuously over time based on marginal values. This is, I think, a very important principle for finance in the digital age.

Let me give a few examples to make this clearer. I first realized that this principle was operative in thinking about performance requirements for IT systems. In the government, where we had a lot of documentation to prepare for each IT system, we had to decide on and document non-functional requirements before a project began. This included things like the expected response time and availability of the system. Similarly, companies often set service level agreements (SLAs) and objectives for disaster recovery and business continuity. In theory—though I’m not sure if I’ve seen this well implemented—we do so based on a business case; for example, how much money will be lost for each hour a system is down tells us how much we would be willing to spend on increasing availability for another hour.

But this way of thinking is simply wrong—or at least has become wrong in the digital age. Here’s why: the real basis for making these spend decisions should be based on marginal values. In other words, if the requirement says that response time should be 600 milliseconds and we get the response time to 650 milliseconds, then we should make the decision about gaining the additional 50 milliseconds based on how much it will cost versus how much value it will bring. Those incremental 50 milliseconds might cost us $1 or might cost us $10 million—surely the decision would often be different in those two cases? Or what if an additional $100 of spend will decrease response time to 200 milliseconds. Should we consider it, or would that be “wasting” our money since the requirement was only 600 milliseconds?

The big development that has changed our optimal way of making these decisions is the change to DevOps and agile ways of delivering IT systems, where we deliver finished features quickly and incrementally, then get feedback and possibly alter our plans based on it. By feedback I mean not only what users say but also based on data and measurements we can take. In the response time case, let’s say that our minimum viable product had that response time of 650 milliseconds but the users thought it was fine, or our measurements showed that it took users orders of magnitude longer than that 650 milliseconds to process the information in their heads, why wouldn’t we avoid spending money on reaching that 600 milliseconds? In the old world, we had to make these decisions before the feedback was available. In the world of cloud, Agile, DevOps, we should make the decisions at the margins instead.

In fact, this is the reason why we build minimum viable products in the first place—it is really a strategy to allow us to make decisions at the margin. In the old waterfall world, we avoided scope creep by cramming a lot of requirements into the upfront requirements document. The result was feature bloat—more features than we absolutely needed. In the DevOps world we can take the opposite way of working. First, decide what our business objective is and how we will measure it. Then build the absolute minimum viable product that will move in the direction of that objective. Then build incrementally on top of that minimal product until the objective is fully accomplished, building the most valuable features first. Then stop!

This turns a total spend decision into a marginal spend decision. Instead of trying to estimate the value and cost of all the features together, we estimate the cost and value of the next incremental feature we are planning to add, and decide whether the marginal feature is a good investment, given the value that has already been delivered. And that is the trick to marginal decisions—when we make them, we have new information available to us: how much value has already been delivered. That information makes our decision better.

Let’s take another interesting example. Remember the Healthcare.gov fiasco? The government’s website was more successful than expected—that is, more people tried to use it than the system was prepared for. As a result, the system crashed and was unavailable for months. If the system had been built in the cloud, then it would have been able to scale as much as necessary to meet the demand. With good cloud practices, the system could have increased its capacity virtually immediately, without customers even noticing the difference.

Of course, that would have meant higher costs, since cloud costs depend on the amount of infrastructure used. The budget would have been for the expected number of users, and the actual number of users would have been higher. So now put your own company in Healthcare.gov’s place. You release a new digital product and budget $1,000 a day for the infrastructure. Suddenly, usage hits $1,000 a day and the product is so much more successful than you expected. Do you put a cap on the spending (which you absolutely can do in the cloud)? If so, you will be turning away customers.[1]Or do you go ahead and exceed the budget?

At least you have the choice. If you had built your own infrastructure on-premises, you are stuck—it will take a long time for you to buy and configure new infrastructure. But in the cloud you can choose. What is your choice?

I think it is another marginal decision. If the marginal value of those extra customers exceeds the marginal cost of the additional infrastructure (and assuming cash is available), then you add the infrastructure. This is a variable cost (see the related post on micro-optimization), which may change with the number of customers, so make the best marginal decision. If there are network effects, then the value is probably increasing non-linearly and this decision is a no-brainer. If there are diminishing returns to new customers (hopefully not) then you might want to consider turning some away. Otherwise, you can make your decision based on the incremental transaction value as I described in the other post.

In other words, if an incremental dollar of spend brings more than an incremental dollar of benefit, then you probably want to spend the incremental dollar, regardless of budget or expectations. This is one of the fundamental ideas of the Beyond Budgeting movement,[2]and applies to the business in general, not just IT infrastructure spend.

As we change from the old, monolithic, waterfall-based way of thinking about IT, we should also change from the old estimated total cost versus estimated total returns way of thinking about IT finance. Fast, incremental delivery allows for fast, incremental financial decision making, which in turn leads to better financial performance. The principle applies to both sides of the DevOps equation, development and operations—develop marginal features and operate a marginal infrastructure. The digital world is a world of decision making at the margins.

Mark

@schwartz_cio
A Seat at the Table: IT Leadership in the Age of Agility
The Art of Business Value
War and Peace and IT: Business Leadership, Technology, and Success in the Digital Age (now available for pre-order!)

[1]Yes, I’m oversimplifying. The result instead might be that you slow the system down for everyone. I’m assuming that you “throttle” the number of users and keep performance the same, just to have one less variable to work with here.

[2]See, for example, Bjarte Bogsnes, Implementing Beyond Budgeting: Unlocking the Performance Potential(Hoboken, NJ: Wiley & Sons, 2009).

Mark Schwartz

Mark Schwartz

Mark Schwartz is an Enterprise Strategist at Amazon Web Services and the author of The Art of Business Value and A Seat at the Table: IT Leadership in the Age of Agility. Before joining AWS he was the CIO of US Citizenship and Immigration Service (part of the Department of Homeland Security), CIO of Intrax, and CEO of Auctiva. He has an MBA from Wharton, a BS in Computer Science from Yale, and an MA in Philosophy from Yale.