How to Automate and Simplify Ongoing Adherence to the AWS Well-Architected Framework Using Coral
By Gil Hecht, Founder and CEO at Continuity Software
Since 2005, Continuity Software has been helping enterprises achieve resilience in their IT environments, drastically reducing outages, and protecting data before it is lost, damaged, stolen, or unrecoverable.
Continuity’s methodology proactively detects misconfigurations and potential single points of failure, as well as other risks throughout customers’ critical infrastructure, to enable remediation before disruptions can occur.
Our solutions help enterprises assure reliability and data protection in multi-cloud and hybrid environments. Specifically, the Coral solution for Amazon Web Services (AWS) automates adherence to all five pillars of the AWS Well-Architected Framework. Coral is also now integrated with the AWS Well-Architected Tool.
Continuity Software is an AWS Advanced Technology Partner with the AWS Cloud Management Tools Competency.
This post describes the Coral solution for AWS, which makes adherence to Well-Architected a routine, simple, and automated activity. I’ll discuss the importance of the AWS Well-Architected Framework and point to the conditions in the AWS Cloud that challenge continuous Well-Architected adherence.
The post also clarifies for AWS Consulting Partners how they can use Coral to quickly and accurately conduct Well-Architected Reviews. This includes onboarding end customers to Coral, discovering actual deviations from Well-Architected adherence, and in light of these, encouraging customers to make improvements with the help of AWS credits to offset costs.
To illustrate the above, I will present the recent experience of CloudZone, an AWS Premier Consulting Partner which used Coral for a fintech customer. I’ll provide detailed examples of misconfigurations discovered by Coral, and the steps to remediation.
About the AWS Well-Architected Tool
AWS developed and promotes the Well-Architected Framework, which helps cloud architects build secure, high-performing, resilient, and efficient infrastructure for their applications and workloads. This enables customers to obtain better outcomes, develop and deploy faster, lower and/or mitigate risks, and make informed decisions.
The AWS Well-Architected Tool helps ensure cloud architects follow framework principles. Its aim is to measure and improve workloads based on customers’ business and technical objectives, and ensure their architecture facilitates achievement of those objectives.
The Well-Architected Tool is a set of questions about workloads to which AWS customers provide the answers. Based on the results, the tool issues a plan for building architecture that adheres to the Well-Architected Framework.
Well-Architected guidelines are invaluable because they help customers create optimal infrastructure. The question is how to consistently adhered to the guidelines and use the Well-Architected Tool for true, ongoing compliance.
Challenges of using the AWS Well-Architected Tool can include:
- The Q&A compliance review is led by an AWS Consulting Partner or Solutions Architect, and conducted together with the AWS customer. The process is high-level, manual, and time-consuming.
- The Consulting Partner does not know whether the answers selected by the customer offer the best path to compliance based on actual workloads, or whether they will follow through and make the changes once the review is completed.
- Finally, the process is performed only quarterly. Cloud conditions, however, can change minute by minute and affect infrastructure stability and security. This can make information provided by customer irrelevant later on.
Cloud conditions and phenomena that undermine Well-Architected compliance may include:
- New services and features are continually released, and it’s impossible to validate them all before each production update. This leads to misconfigurations and injects risks into production environments.
- The multiple DevOps and development groups that maintain the environment are not always aligned with up-to-date best practices or aware of other teams’ changes. This impacts robustness and can lead to outages, security breaches, costly performance disruptions, and data-loss incidents.
Insights Regarding Compliance and Best Practices
There can be drawbacks to using the AWS Well-Architected Tool as a means of ongoing adherence to the Framework. That’s where Coral’s automated and proactive solution comes in.
Coral is a software-as-a-service (SaaS) solution that brings specificity, insight, and simplicity to AWS Well-Architected compliance. It detects misconfigurations and risks across all layers of your AWS environment, including virtual machines, containers, networks, load balancers, databases, cloud storage, DNS, and more.
Coral’s rapid but in-depth and precise analysis, performed on actual workloads, detects risks and misconfigurations while providing specific recommendations and remediation guidelines. It yields insights regarding compliance and proper use of AWS resources to build better and healthy infrastructure over time.
The AWS Consulting Partner or Solutions Architect conducting reviews can see real risks immediately using Coral. Consequently, they are able to formulate a more focused plan for architecture improvement.
Figure 1 – Results of a scan for adherence to the Reliability pillar: Question 7 on the Well-Architected Tool.
Proprietary Knowledge Base
One of Coral’s main features is its use of Continuity’s proprietary knowledge base containing hundreds of the most updated best practices (with more added every day) for robustness from vendors, industry, power users, and cloud providers. This includes AWS best practices for adherence to Well-Architected guidelines.
Coral’s built-in risk detection engine, which utilizes machine learning (ML) algorithms, runs analyses against this knowledge base. When a risk is detected, a detailed description and the recommended path for resolution are presented. Thus, problems are identified before they can cause harm.
Coral’s clear guidelines for remediation enable users to fix problems and prevent them from eventually disrupting operations. Its automated self-healing makes maintaining a reliable and secured environment even simpler.
Visibility and Control
Coral’s intuitive dashboard is an integral component of the solution. Users immediately see any misconfigurations and flaws threatening robustness in all of their multiple and geographically dispersed cloud environments.
Clicking on the dashboard compliance view displays areas of non-compliance with the Well-Architected Framework pillars—per account and workload.
Users see each environment’s health status with respect to risks and configurations, as well as the potential impact each risk may have on critical business services along with level of urgency. Organizations can track adherence improvement over time.
Use Case: How CloudZone Uses Coral
CloudZone has been working with AWS for a decade and is a Continuity Software partner. It supports enterprises at every stage in their cloud journey with a wide variety of expertise and services, including quarterly reviews of adherence to the AWS Well-Architected Framework.
Utilizing Coral’s abilities to automatically pinpoint the risks in the customer’s production workloads on AWS, CloudZone has devised its own methodology for conducting the Well-Architected review.
The method they use with customers consists of the following steps:
- The AWS customer is onboarded to a (free trial) Coral account for the workload being reviewed. All of the risk indicators, as they impact adherence to the Well-Architected Framework’s five pillars, are quickly collected.
- Based on the information resulting from the Coral scan, CloudZone and the AWS customer discuss the findings.
- In addition to actual Coral insights, CloudZone uses the Well-Architected Tool to get a more high-level picture and further understand the customer’s goals and methodologies for reaching them.
- CloudZone develops a Statement of Work (SOW) for the customer, reflecting the changes that need to be made to reach their goals. This is based on the Well-Architected Tool’s information and any misconfigurations and errors detected through Coral.
- After receiving the customer’s go ahead, CloudZone repairs the risks discussed. AWS provides customers with a $5,000 credit when a minimum of 25 percent of high-risk indicators (HRIs) are improved. This helps pay for some or all of the work performed by the AWS Consulting Partner.
CloudZone CRO Ori Tabachnik points out that, “Frequently, work revolves around architectural decisions that were made and as a result of the review, infrastructure fixes that need to be made.”
CloudZone Customer Success
A fintech company, born on the AWS Cloud, recently approached CloudZone. They were about to integrate new critical applications for use by their customers who run millions of accounts worldwide. The fintech wanted to ensure their infrastructure was stable and could run the new apps without any disruptions.
An integral part of rapid deployment and providing uninterrupted service to their customers was adherence to the Well-Architected Framework’s five pillars. This was the standard the fintech adopted.
The objectives the fintech wanted to achieve were:
- Ability to rapidly deploy innovative financial applications.
- Delivery of the highest levels of reliability and security for mission-critical workloads on AWS.
CloudZone performed the Well-Architected Review process, and Coral scanned:
- AWS environment: one AWS account including 468 Amazon EC2 instances, 166 load balancers, and four Amazon RDS instances.
- Tight integration with CI/CD pipeline to achieve rapid deployment. Each build tested in dev > staging > prod.
According to CRO Ori Tabachnik, “Within a few minutes of onboarding the customer, we got the answers to our big question—where are the risks?”
Coral Scan Results
Of the 38 different check violations conducted for the fintech, Coral revealed 276 risks of non-compliance with Well-Architected guidelines, including significant risks of downtime, data-loss, and security.
These affected the resilience of the AWS environment. Risk levels detected were: 72 high risks, 34 medium risks, and 146 low risks.
The scan revealed the downtime, security, and data loss risks spanned all layers of the infrastructure, including auto scaling groups, content delivery networks, virtual machines, storage, load balancers, and more.
With respect to Well-Architected, major instances of non-adherence to the reliability and security pillars were seen.
Figure 2 – Coral compliance view showing Well-Architected best practices rules violated.
Presented below are three of the 276 risks detected by Coral’s Well-Architected Review examining the fintech’s practices with respect to each pillar.
Example 1: Reliability Pillar – High Risk of Application Downtime
This question-answer sequence refers to: RELIABILITY pillar, Question 2. How do you manage your network topology?
The answer the fintech selected was: “Use highly available connectivity between private addresses in public clouds and on-premises environment.”
In practice, Coral discovered the fintech was in violation of the following rule: “Site-to-site VPN connections must have more than one active tunnel.”
The fintech’s site-to-site virtual private network (VPN) connections had only one active tunnel. This represents a single point-of-failure that may lead to network disconnects between the data center and the virtual private cloud (VPC), leading to a high risk of application downtime.
It also speaks directly to the type of risk the fintech wanted to make sure it was avoiding. To repair the risk, CloudZone had to ensure there were at least two active tunnels available.
Example 2: Security Pillar – High Risk to Security
This question-answer sequence refers to: SECURITY pillar, Question 6. How do you protect your networks?
The answer the fintech selected was: “Limit exposure.”
Coral discovered the fintech was in violation of the following rule: “Public EC2 instances do not allow unrestricted TCP access.”
The fintech’s Amazon EC2 instances in region sa-east-1 with a public IP address had unrestricted TCP access. Unrestricted access makes the system vulnerable to malicious activity.
To repair this violation, the recommendation was to restrict access to specific security groups or IP addresses (that require it), and to implement the principle of least privilege in order to reduce the possibility of a breach.
Example 3: Security Pillar – High Risk to Security and of Downtime
This question-answer sequence refers to: SECURITY pillar, Question 9. How do you protect your data at rest?
The fintech selected: “Enforce encryption at rest.”
Coral discovered the fintech was in violation of the following rule: “RDS with storage encryption.”
The Amazon RDS instances in region sa-east-1 did not have storage encryption. If the database was created without encryption, storage encryption cannot be enabled. If/when storage encryption is needed, it can lead to downtime. This requires a re-architecting of data protection methods at the fintech.
Bottom Line Results
With the Coral scan results in hand, detailing 276 risks in one AWS account, CloudZone was able to have a very focused, results-oriented discussion with the fintech about what repairing the risks involved, what to prioritize, how to achieve reliability and security, and how to maintain it.
Using Coral, CloudZone gains visibility into the customers’ environments and immediately receives a full picture of their health status. This allows them to engage in a rapid and cost-efficient process from which both they and the customer emerge with a realistic remediation plan that can be implemented.
In our example, the fintech accepted the SOW submitted by CloudZone following the Well-Architected Review and quickly saw an 80 percent reduction in the number and severity of downtime, data recoverability, and security issues.
They took CloudZone’s advice and implemented the full Coral solution on their site. Their environment now benefits from continuous assurance.
The fintech handles risk repair on their own. They follow the remediation guidelines provided by the solution, repair risks and misconfigurations, and have a much better understanding of their health status of their environments.
Continuity Software has an API integration of the Coral solution with the AWS Well-Architected Tool. In this post, we described how Coral ensures environments adhere to the Well-Architected Framework to drive better outcomes for workloads while helping customers avoid disruptions resulting from non-adherence.
The CloudZone use case shows one instance of how an AWS Consulting Partner creatively and effectively uses Coral to drastically reduce and optimize time spent on the Well-Architected compliance reviews. This helps them obtain precise results, quickly uncover risks in AWS customer workloads, and repair them.
Likewise, the use case also presented the customer aspect and showed how, using Coral, a fintech benefitted from continuous assurance with respect to all the Well-Architected pillars; particularly those that are the most critical for them—reliability and security.
Free Trial for AWS Consulting Partners
Continuity Software is offering a 14-day free trial of Coral. It’s available to AWS Consulting Partners that have been certified to perform Well-Architected workload reviews and are part of the AWS Well-Architected Partner Program.
In addition, any AWS customer with a significant critical production workload (50+ nodes) is invited to take advantage of this offer. The trial is for a single user and includes unlimited accounts with a total of up to 500 nodes.
The trial yields reports on primary risks and provides instructions on how to repair them. This can lead to greater security, reliability, and adherence to AWS Well-Architected in the environments running Coral.
The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this post.
Continuity Software – AWS Partner Spotlight
Continuity Software is an AWS Cloud Management Tools Competency Partner that addresses resilience assurance challenges in public cloud and proactively prevents outages and data-loss risks.
*Already worked with Continuity Software? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.