Governance in the AWS Cloud: The Right Balance Between Agility and Safety
By Paolo Latella, Cloud Practice Manager and APN Ambassador at Claranet Italia
Cloud infrastructure provides more agility and responsiveness than traditional IT environments. This requires organizations to think differently about how they design, build, and manage applications.
Cloud resources are disposable, and with a pay-per-use model it requires a strong integration between IT governance and organizational governance. Builders need to be able to operate in a cloud environment that’s agile and safe at the same time.
In this post, I will introduce a decentralized model of cloud governance that can help you strike the right balance between agility and safety.
As the Amazon Web Services (AWS) practice manager at Claranet Italia, I help customers move to the cloud while reducing the time between ideas and production. In the last few years, Claranet has helped many customers move their workloads to AWS, from scaled-up startups to enterprise organizations. During this journey, we push for a cloud-native approach and the right governance model.
I am an Authorized Instructor Champion at AWS, and have been an APN Ambassador since 2018. APN Ambassadors work closely with AWS Solutions Architects to migrate, design, implement, and monitor AWS workloads.
Claranet is an AWS Premier Consulting Partner with AWS Competencies in DevOps, Migration, Data & Analytics, and Digital Customer Experience. Claranet is also a member of the AWS Managed Service Provider (MSP) Partner Program.
Governance in the Cloud
Governance is a mix of processes, people, and technologies that drive the cloud journey. In my experience, the principles of cloud governance and challenges organizations face are as follows.
- Drive cloud-native culture.
- Align cloud strategy to business objectives and IT strategies.
- Promote and leverage a Cloud Center of Excellence (CCoE).
- Democratize emerging technologies.
- Identify the right KPI and ROI of cloud adoption.
- Identify and maintain required policies and compliances.
- Define (cloud-ready) contracts with internal and external stakeholders.
- Support employees’ career paths with new skills.
- Adapt the security and compliance process leveraging the Shared Responsibility Model.
Other challenges faced by organizations on a cloud journey include adapting risk evaluation and mitigation strategies; avoiding technology proliferation and pushing to build reusable, cloud-native components; and turning staff from skeptics to true believers.
Agility and Safety
One of the biggest challenges a governance team faces during cloud adoption is to improve the enterprise’s agility without compromising safety.
Usually, the concept of cloud agility is quite clear, but safety often gets confused with cyber-security. In this post, I am using the term “safety” because it’s not only about (cyber) security.
Security is the protection of our cloud environment from information disclosure or service disruption, while safety is more than this. Safety also refers to control on costs and resources configuration assessment.
Figure 1 – Cloud governance functions.
To define the right balance between agility and safety, the cloud governance team must define new rules, engage the right people, and manage dynamic cloud resources leveraging AWS Management and Governance services.
Most of the company resolves this problem by adopting a centralized governance (cloud resource broker). This approach guarantees maximum controls but reduces agility. We need to introduce a new concept of governance that’s more agile and based on DevOps principles and is therefore less centralized.
In a decentralized governance model, it’s essential to develop a critical mass of people with AWS experience, establish new operational processes, and leverage services designed to improve agility and guarantee safety.
The goal is to create a decentralized governance that permits great control over standards and costs (safety) while allowing better responsiveness to business needs (agility).
People and Process: The Cloud Center of Excellence
The right way to define and implement a decentralized model of governance is to bring together a group of cloud experts from within the organization to provide leadership, define best practices, and drive builders in cloud adoption.
There are many ways to define a structure of Cloud Center of Excellence, but in each of these we can identify two entities:
- Functional CCoE: More related to provisioning and operating of the cloud resources.
- Advisor CCoE: More related to cloud strategies and impact on the business.
Figure 2 – Structure of a Cloud Center of Excellence.
The functional CCoE is an “elastic” component of the team because it scales and reorganize to meet the needs of the advisor CCoE and therefore of the business. The functional CCoE can be started with 2-3 people and grow up to include more team members.
In any case, the functional CCoE is responsible for:
- Defining best practices for design and operate applications in the cloud.
- Providing centralized and shared services.
- Creating reusable and preconfigured resources.
- Implementing the right guardrails and related controls.
With regards to best practices, the Cloud Center of Excellence creates and maintains design principles and architectural best practices for designing and running applications in the cloud.
A good starting point is the AWS Well-Architected Framework, which is based on a set of foundational questions that allow you to understand if a specific architecture aligns well with AWS best practices. The framework is built on five pillars: operational excellence, reliability, performance, security, and cost optimization.
The CCoE must involve the builders during a definition of these new principles or practices (bottom-up approach). Also, when new processes have been defined or new technologies have been adopted, the CCoE should organize webinars, workshops, or more with goal to evangelize the rest of IT department.
Leveraging Shared Services and Templates
Use of shared services is one of the top architectural design principles. It permits builders to inherit standardized architecture during development of new services.
The CCoE should push to create shared services to:
- Separate the duties and responsibilities.
- Centralize cross-projects services enabling a single point of control.
For example, we can define centralized logging solutions with the goal to collect, analyze, and display Amazon CloudWatch Logs of our spoke accounts in a single dashboard. Or, we can provide to our builders a cross-account CI/CD pipeline to simplify and standardize tasks and introduce a separation of duties. Builders can develop and test the new release, but the deployment in production requires an authorization.
The CCoE works with builders to produce company blueprints. These are reusable and preconfigured services that drive builders to adopt (reuse) a template reference on their projects. Inside the blueprints, we can find articles like those described on the Amazon Builders’ Library, or templates provided by AWS Launch Wizard.
The CCoE creates and manages catalogues of pre-approved cloud resources. This can range from a group of Amazon Machine Images (AMIs) up to entire infrastructure.
For creating and managing a catalogue of AMIs, one practice is to set up a process to create golden AMIs using CI/CD pipeline together with Chef and Packer, or use a managed service as Amazon EC2 Image Builder. For creating and managing entire infrastructures, we could use AWS CloudFormation and AWS Service Catalog, which allows us to have catalogues of pre-approved infrastructures.
A decentralized model of governance works fine only if the CCoE simplifies the job of builders and allows the company to speed up the release of new products and services.
When the business launches a new product or service, all the steps necessary to enable builders should be automated. From account creation (we use a multi-account environment for our projects) to the deployment of applications, all of the steps should be automated. Here, the degree of freedom can range all the way up to builders who are able to autonomously create cloud resources including the accounts.
In any case, the Cloud Center of Excellence must define account provisioning process, enable controls on compliance and costs, and launch a pre-approved set of configurations (for example log centralization). These controls represents a perimeter that prevents builders from go off-road (guardrails).
Using AWS Control Tower or AWS Deployment Framework (ADF), the CCoE can set up and govern a multi-account AWS environment knowing accounts conform to best practices and that all defined guardrails are enabled.
Technologies: Controls Over Standards
To balance agility with safety, a decentralized model of governance requires a company govern its resources and monitor compliance across accounts. The Cloud Center of Excellence is responsible for defining and applying the right guardrails on accounts that developers create for their projects.
Usually, these guardrails are implemented as controls, such as:
- Directive controls: Defines the guidance in design and build.
- Preventive controls: Protects workloads that prevent misconfiguration or vulnerabilities.
- Detective controls: Provides visibility and transparency over the operation.
- Responsive controls: Mitigates error or misconfiguration in real-time.
Figure 3 – Safety vs. security.
Directive controls are implemented as best practices and design principles. Here, the CCoE has a relevant role because it’s responsible for engaging and evangelizing builders about this directive. For example, the CCoE should communicate to builders the importance of using an AWS Identity and Access Management (IAM) role to delegate access to users, applications, or services.
Preventive controls are implemented as a set of policies at the account level or organization level. For account-level control, IAM provides user-based policies and resource-based policies that define what a builder can or can’t do on a specific account.
For organization-level control, AWS Organizations is a service that groups accounts in Organizational Units (OU) with the goal of defining the budgetary, security, and compliance needs on a specific group of accounts. For example, we can define three different OUs for every environment (development, pre-production, and production) and define three different preventive controls.
For each OU, the CCoE can define one or more service control policies (SCPs) with the goal of creating the right guardrail for that account. For example, the CCoE can force the builders to apply a tag relate the cost center on Amazon Elastic Compute Cloud (Amazon EC2) instances applying an SCP that require this tag to perform launch instance operation.
Figure 4 – SCP and identity policy.
The configuration above is an example of the right balance between agility and safety. Builders have a policy that permits them to launch instances autonomously, but only if they follow the requirement (implemented by SCP) to put a tag on the instance. Remember, the SCP defined on AWS Organizations can be deployed automatically to a new account by AWS Control Tower.
Detective controls are implemented as logs analysis or events processing. Usually, the output of a detective control is an alarm or metric on a dashboard.
The task of the CCoE is to define standards on log format, push to centralize logging and events processing (the shared services), and obviously enable the log generation. The security pillar of AWS Well-Architected suggest the following best practices on these kind of controls:
- Configure service and application logging.
- Analyze logs, findings, and metrics centrally.
- Automate response to events.
- Implement actionable security events.
With detective controls, AWS CloudTrail and Amazon CloudWatch can alert and act on operational or application issues. AWS Trusted Advisor provide guidance on how you can optimize your infrastructure in terms of costs, performance, security and resilience.
Responsive controls process events from cloud resources with the goal of continuously auditing and assessing the overall compliance of AWS infrastructure. AWS Config enables you to assess and evaluate the configurations of your AWS resources.
With AWS Config rules, the CCoE can implement detective controls by processing events from resources and create real-time remediation actions using AWS Systems Manager or AWS Lambda. For example, you can implement a Config rule that assesses the configuration of security groups or bucket policy and fix it automatically in case of misconfiguration.
Figure 5 – AWS Config rules in action.
Technologies: Controls Over Costs
Safety is also a budget and cost management issue. With agility in mind, we must allow builders to create new cloud resources without losing control of the costs.
AWS provide several services to help you monitor costs:
- AWS Cost Explorer enables you to view and analyze the costs and usage (detective control).
- AWS Budgets enables a simple cost and usage tracking (preventive control).
- Anomaly Detection is an AWS Cost Management feature that uses machine learning to continuously monitor your cost and usage to detect unusual spends; it’s a detective control, but if you integrate it with Amazon Simple Notification Service (SNS) and Lambda it becomes a responsive control.
The real question here is: How we can speed up the process of budget allocation without slowing down the creation of a new account? There are, of course, many ways to do this. Usually, a company’s business units already have a budget allocated, so you need only to reallocate it on the accounts.
For example, we can implement a budget allocation process between builders and product owners using AWS Chatbot. The builders ask for a new account on a Slack channel, and the product owner answers “yes” and sets the budget running AWS Command Line Interface (CLI) commands in the channel.
Governance in the cloud is a mix of process, people, and technologies that drive cloud adoption and improve an organization’s agility without compromising safety. Remember, I use the term “safety” because cloud governance is not only about (cyber) security.
Creating a decentralized governance permits great control over standards and costs (safety) while allowing better responsiveness (agility). It’s important to develop a critical mass of people with AWS experience, establishing operational processes, and forming a Cloud Center of Excellence (CCoE) that’s dedicated to mobilizing the appropriate resources and defining best practices.
Using AWS Management and Governance services, the CCoE can provide preconfigured services or resources to builders, and define preventive and detective controls that reduce vulnerabilities and maintain compliance across the organization.
The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this post.
Claranet – AWS Partner Spotlight
Claranet is an AWS Premier Consulting Partner and a European leader in application management that specializes in providing managed AWS solutions to mid-size organizations.
*Already worked with Claranet? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.