Untangling Your Organisational Hairball: Well-Controlled

One of the most common sources of org debt is the knee-jerk reaction. Every time something goes wrong, we immediately jump to create a constraint to prevent future mistakes. So we institute a new role, rule, or process. As a result over a decade or two, a one-step process becomes twenty steps. Or five different processes become entangled. Or ten different roles become approvers for a simple decision.”

—Jason Fried, Cofounder and CEO of 37signals

Controls have existed since ancient times. In Ptolemaic Egypt, from 323 BCE to 31 BCE, there was a dual administration, with one set of bureaucrats charged with collecting taxes and another with supervising them. Internal controls are not new—they have had different names and drivers over the millennia.

Modern organisations have never had a more crucial need for controls. But it is critical to implement controls without constraining agility. Commercial and public sector AWS customers I meet typically use an eclectic mix of existing controls for their business or mission. In one way this is perfectly fine; each business is different, and so is how each business interprets differences.

What isn’t fine is neglecting to regularly assess those perceived controls as they contribute towards your Bureaucratic Mass Index (BMI) (non-value-adding time plus waiting time divided by total available time). BMI manifests at stand-ups when teams are blocked. One of the most impactful questions I asked as a CTO before moving to the cloud was, “How much time have you lost waiting for other teams this week?” Shockingly this truth-seeking question was often answered by staring at shoes and saying, “Quite a bit.”

Ouch.

This is a leadership challenge, not a specific team challenge. Only leaders can be expected to fix this.

Diving deeper reveals a spider’s web of answers, many related to breakdowns from the past, such as perceived architecture failings, critical incidents, process breakdowns, audit failings, single points of success/failure, and, sadly, a fundamental lack of trust between teams.

“We can’t give Team X access to Task Y because we don’t trust them to do it correctly!”

Double ouch!

We cover these organisational wounds with Band-Aids, hiding the eventual build-up of hardened, immovable scar tissue. The unnecessary bureaucracy driving BMI slowdown is often the result of no-longer-appropriate controls in place.

One of the fundamental benefits of moving to the cloud is transforming the siloed teams of the past into the diversly skilled, business outcome-driven teams of today and tomorrow. Your teams don’t have to go to fifteen different silos because they can leverage 200+ API-enabled AWS services. However, if you don’t review your existing controls for appropriateness when moving this way, your teams will stay in the slow lane.

Enterprise Architecture

It’s never been more important to ensure the components you select to use or build add critical value and are secure, durable, available, fast, scalable, and cost-effective. How architecture is actually performed is even more critical.

The Enterprise Strategy team works with many AWS customers who still have IT architecture teams sitting in “ivory waterfall towers.” They work on the critical architecture element of the infamous IT waterfall delivery methodology. Architecture is researched in a forlorn attempt to address all known constraints and provide a well-thought-out, joined-up technology plan for downstream teams to implement. This team’s role is to find the “least worst” trade-off choice when designing. But in small, outcome-facing teams, this is an anti-pattern. It is neither agile nor conducive to ownership for one team to wait while another that won’t own the on-call pager endlessly researches the least worst option’s technology choices.

AWS operates within a two-pizza team model. There are no official, internal-facing roles for “architects.” Instead engineers at all levels have continual ownership to build, support, and iterate their services. This way they are responsible for continually refining their architecture, including swapping out elements if needed. Within this software development engineer family is the principal SDE role. This individual has some serious engineering chops. They provide the engineering linchpin for the team and are expected to engage with “p-level” engineers in other teams when they need to collaborate and solve problems holistically. It is not unheard of for managers to say, “A bunch of p’s are working on it” when talking about a particularly gnarly issue.

The critical question I ask AWS customers (and the one asked of me when I was an AWS customer) is “What is stopping you from moving that architecture responsibility into the business outcomes teams directly?” AWS customers often answer that the sheer interconnectedness of distributed systems (and the modern complex hybrid on-premises/cloud) needs somebody who can see the interconnected picture and ensure that the architecture is considered. My response to that is in two parts.

First, what is stopping you from putting business outcome teams that have the most dependencies together, perhaps in the same release train, even under the same delivery executive? They no longer have to be together physically, but they should probably plan together, working to enable and manage the collective understanding of their interdependencies. Second, what is stopping you from embedding these architects into the teams? Asking them to collaborate to guide the architecture means ensuring the teams can build, support, and iterate continually. One team making critical technology choices and then, when things go wrong, waking a different team to fix it is not conducive to ownership or success.

Complementary Skills

Software developers are busy folks. They don’t necessarily have the time, ability, or desire to learn cloud services immediately. Asking them to become proficient at using and operating cloud services is particularly challenging, especially when they are used to placing tickets and receiving service from a central infrastructure team. One of the key enabling activities to overcome this challenge is assigning a cloud engineer to that business outcome team (permanently, if necessary). Effectively using AWS services and initiating via IaC is a crucial learned skill. Way back in 2017, I wrote a blog post about the criticality of cloud skills and mechanisms you can plagiarise to scale your own way.

Some of the best cloud engineers I’ve had the privilege of working with have worked in both software and infrastructure roles. They recognise the similar but different approaches in both roles. I am dramatically oversimplifying, but software engineers are often focused on logic, data structures, messaging, caching, performance and UX (obviously front-end developers). Former on-premises infrastructure engineers are often obsessed with IP addressing, Microsoft Active Directory, backups, disaster recovery, Amazon Machine Images, and other configurable infrastructure items. The skills needed to create YAML/JSON formatted AWS CloudFormation templates are not dissimilar to those needed to create and maintain a Unix cron job or a Cisco configuration file. Complementary skills eliminate dependencies on other central teams. Putting these folks togethers in one team is a powerful enabler of speed.

But when doing this, we need to “Trust, but verify” what they are doing. Enter Controls.

Compliance with Controls and Control Objectives

Highly regulated entities leverage controls to ensure adherence to a compliance framework. A control is a high-level rule that provides ongoing governance for your overall AWS environment, expressed in plain language. Companies that must comply with multiple compliance frameworks and assertions have a compelling reason to use AWS Control Tower, which implements preventive, detective, and proactive controls that help you govern your resources and monitor compliance across groups of AWS accounts. There was a great talk at re:Invent 2022 called Setting up Controls at Scale in your AWS environment, which shows the critical foundational role that AWS Control Tower has in helping all customers get going the right way. I would strongly encourage you to view the breakout and essential user guide that goes deeper here. The goal is to ensure that business outcome teams have a range of controls that operate together to ensure teams can stay compliant, removing even more blockers from the organisation.

Segregation of Duty

Some organisations rightly prevent their developers from accessing production infrastructure and data stores, often to achieve segregation of duties. In his re:Invent 2022 presentation about running services without access to data, AWS VP and Distinguished Engineer Colm MacCárthaigh quotes Jerome Saltzer: “Every program and every privileged user of the system should operate using the least amount of privilege necessary to complete the job.” Colm expands on the many mechanisms AWS has implemented to achieve this. (I particularly love “Customer data is radioactive to us.”). I also really like how the governance approach continues to mature with choice. The GoDaddy story in this re:Invent breakout from 2023 Governance and security with infrastructure as code explains in detail their approach. The accompanying presentation from colleagues Eric Beard and Kevini DeJong shows how Amazon handles code pushes, principles to enable this outcome, account boundaries and how code is reviewed and the automation of deployment afterwards. It is essential viewing in my opinion.

Conclusion

Every time a team is blocked or slowed down, it is crucial to ask why. Ensure teams can recognise when they are blocking and actively work to remove human dependency. The principles you adopt as a leader are vital to meeting this goal. Speed is an executive choice. In 2023, Phil LeBrun, Matthias Patzak and I had the privilege of creating the new AWS breakout presentation ‘Untangling your organizational hairball’ and presenting within the AWS re:Invent Executive Summit, the recording of my presenting this content is available here and covers all the content of all the blogs in this series.

“All of your assumed constraints are debatable”

Jonathan

AWS Cloud Enterprise Strategy Blog