AWS Cloud Operations Blog
How to perform a Well-Architected Framework Review- Part 2
There are three phases to conduct a successful Well-Architected Framework Review or WAFR: Prepare, Review and Improve. In part 1 of this blog series, we discussed the preparation phase. In this part, we will dive deep into the best practices of the second phase, the actual review.
Figure-1 WAFR Phases
Assuming you follow the recommendations in the preparation phase , at this point, you should have identified the workload you like to review, identified sponsors, decided on the pillars to review and their priority, decided on what lens to use (if any) and format of the review session. You should also have collected the necessary data on your workload to answer the review questions.
The goal of WAFR
Before we dive deep into some recommendations for a successful WARF, it’s important to recap that the ultimate goal of the review is to improve systems architectures so that these systems can better support business needs. The architecture improvement process starts by reviewing the current architecture and comparing it against best practices. You do so by answering the review questions. A set of questions for each Pillar. Based on the answers, we identify areas to improve, also called High-Risk-Issues (HRI) and Medium-Risk-Issues (MRI). Next, we work on creating a treatment plan to remediate these risks using a priority-based approach.
WAFR best practices
1- Set expectations. WAFR is a big-time commitment for all participants. Take the time to have this conversation with main stakeholders in advance, so they are clear on the expectations and their roles before, during and after the review. Make sure you get their support.
2- Conversations and not an audit. The best results we see from WAFR sessions is when they are looked at by stakeholders as conversations, not checklists or scoring exercises. This will encourage all team members to speak openly about their systems as long as they are not blamed for missing some best practices. This will also help uncover architectural risks.
3- Team sport, everyone on the team should play a role. For example, a pillar sponsor should make sure that all questions in the pillar are answered correctly. The sponsor then should own the improvement plan for risks identified during the review. This becomes more important when we discuss the Improve phase of the review where different teams need to engage in creating priorities and finding solutions for the risks identified. This is discussed in part 3 of this blog series.
4- Continuous check, and not one time effort, things always change and should be kept in check. I recommend creating a practice within the organization to conduct WAFR (or a customized version of it) on regular basis, or following big milestones on the workload’s lifecycle, such as going from Test to Production.
5- Earlier is better, because it’s easier to influence decisions and drive changes while things are still in the design phase and not in production.
6- Use the AWS Well-Architected Tool (AWS WA Tool)
The WAFR questions are available as a white-paper. However, I recommend that you use the AWS WA Tool for the review. Using the tool will enable you to track questions, take notes, create different milestones, understand the question’s context, understand the best practice being validated, and explore additional resources for the best practice in question in blogs, re:Invent talks, or documentation.
Using the AWS WA Tool also helps you to create a custom lens. Using custom lenses, you can create your own pillar, questions and best practices. You can tailor the questions in a custom lens to be specific to a particular technology, which helps you meet the governance needs within your organization.
Check out these examples:
- Customize Well-Architected Reviews using Custom Lenses and the AWS Well-Architected Tool
- Implementing the AWS Well-Architected Custom Lens lifecycle in your organization
- Best Practices for the Custom Lens Lifecycle: Plan and Implement
- Best Practices for the Custom Lens Lifecycle: Measure and Improve
During the review phase, it’s important to take the notes necessary to explain whether a specific best practice is in place or not. If it’s implemented, you tick its checkbox in the question. If not, take a note in the notes area to explain why it has not been implemented. Is it on the roadmap? Does it conflict with other requirements? Is it simply missed? The answer for these questions will help the team later in creating an improvement plan. It will also help other reviewers as teams and owners may change.
Milestone is another feature I recommend you use. A milestone records the state of a workload at a particular point in time. When you conduct multiple sessions, or when you work on improvement items, you can save a milestone to measure the progress as you go.
7- Maximize time
WAFR should be short and should be completed in hours, not days. To keep the review process concise, it’s important to maintain a balance between asking follow up questions to validate best practices, and to remain within the question context without spending too much time on deep technical discussions.
For example, monitoring is a topic that we bring up across all six pillars. However, the context will vary per pillar. Monitoring, when reviewing the Operational Excellence Pillar, is about observability and understanding workload’s health by establishing metrics and KPIs. In the Security Pillar, monitoring context will shift to be more about auditing environments, tracing malicious activities, understating unauthorized behavior and so on.
Another note, is to avoid going deep into technical discussions during the review. For example, going into configuration details for a service. Also, avoid jumping into the solution part, because you likely will not have enough time and the necessary details during the review to recommend the right solution on spot. Instead, take notes, and follow up on this topic as part of the Improve phase as we will see in part-3.
8- Maybe is No
In some cases, the team will not be sure if a best practice is implemented or not. In this case, you need to consider the best practices not implemented and document that in the notes in the WA Tool. This way, you can include the solution (or more validation) as a follow up as part of the improvement phase.
9- Scale and automate as necessary
For large organizations with many workloads, consider building automated and scalable processes to review workloads, identify risks, and remediate them.
Here are a few examples on how to integrate WAFR into your organizations created by my colleagues. You can adjust and reuse these solutions as it fits by your organizations.
- Create and update Well-Architected reviews using AWS CloudFormation (Lab).
- Build custom reports of Well-Architected Review (Lab): An example to integrate AWS Well-Architected data into a centralized reporting tools using AWS Well-Architected Tool API.
- Scaling Aws Well-Architected Reviews through the enterprise (re:Invent 2022 session): An example for creating a standardized and consistent approach to reviewing workloads and building scalable architectural health reporting.
- Cloud optimization with Trusted Advisor and AWS Well-Architected Tool (re:Invent 2022 session): This session shows you how to integrated AWS Well-Architected Framework, AWS Well-Architected Tool, and AWS Trusted Advisor to identify opportunities for cloud optimization.
- AWS Well-Architected best practices for DevOps on AWS (re:invent 2022 session): An example to align your organization’s DevOps practices to the pillars of AWS Well-Architected Framework.
- Accelerating Well-Architected Framework reviews using integrated AWS Trusted Advisor insights (blog post).
Summary
In this blog post, I shared some of the lessons learned from conducting many Well-Architected Framework Reviews with customers from different industries. The ultimate goal for WAFR is to identify architecture risks and address them. To get there, you first need to review your workload architecture against AWS best practices. There are a few recommendations to follow and anti patterns to avoid when running WAFR. The review has to be conversational, honest, documented, and finished in days, not weeks. If you run reviews for multiple workloads, you need to automate and scale the process as per your organization’s best practices. I shared some of the resources from our SAs and customers to show you examples on how to do so. In the next step, after you identified risks, you need to create an improvement plan to address them. This will be covered in part 3.
About the author