AWS for Industries
Automated remediation: Securing the Volkswagen AWS landing zone at scale
In 2019, Volkswagen AG (VW) and Amazon Web Services (AWS) formed a strategic collaboration to co-develop the Digital Production Platform (DPP), aiming to enhance VW’s production and logistics efficiency by 30 percent while reducing production costs by the same margin.
As the DPP infrastructure expands, maintaining its security becomes a critical priority. This can be achieved through a comprehensive approach that integrates preventative, detective, and responsive mechanisms, helping to ensure robust protection in a dynamic cloud environment. The DPP landing zone was increasingly facing this challenge because of a fast-growing number of accounts each quarter. With over 180 projects, the number of vulnerabilities due to misconfigured AWS resources by the AWS account users expanded rapidly, increasing potential risk for the company. Examples for these misconfigurations is a publicly accessible Amazon Simple Storage Service (Amazon S3) buckets or internet exposed Amazon Relational Database Service (Amazon RDS) tables.
In this blog, we’ll discuss an automated process that helps the VW AWS account security team remediate vulnerabilities in the AWS landing zone by reconfiguring AWS. This is done at a controlled pace by VW’s DPP security team to help minimize operational impact to the projects while helping managing exposure risk.
“Automated remediation has been a game changer for VW’s security posture. It helped us achieve a higher company level compliance posture, reduce risk, and increase developer productivity while significantly reducing our response time,” says Dr. Stephan Teuber, security officer for the VW DPP.
For example, the automated remediation initiative increased VW’s overall level of fixed security issues from 95 percent to 99.9 percent for critical vulnerabilities and from 85 percent to 98 percent for high-severity vulnerabilities. We deployed over 15 remediation modules covering AWS Security Hub and partner tool findings. In its first year of operation, the system performed over 100,000 remediations (approximately 8,300 actions per month) and is now operating at a steady state of 3,500 remediation actions per month in 1,200 AWS accounts.
As team members from AWS and VW, we collaborated to develop an innovative approach for automated remediation of vulnerabilities. VW, with the support of AWS, defined the following requirements to help scale at the same speed as the DPP platform while keeping the potential operational impact to VW as low and controlled as possible:
- A remediation is defined as correction of a vulnerable configuration of an AWS service. This correction is needed to help mitigate a potential security vulnerability to the service configuration introduced by the AWS account owners. These vulnerabilities are detected and provided by the AWS Security Hub.
- A clear decision process for new remediations: To decide which vulnerability to automatically remediate, the team went through a data collection and analysis to evaluate the potential negative impact and effect to all VW’s AWS accounts.
- A user faced communication to introduce new remediations: To help ensure every user of the landing zone is aware and prepared, new remediations must be communicated 3 weeks in advance of launch to non-production accounts and 3 months in advance to production accounts.
- A controlled pace of remediation: To control a risk of an unexpected negative impact to the mitigated resource configuration, the number of remediation actions must not exceed a specified threshold. The threshold of actions per run must be evaluated based on the remediation complexity and number of affected resources. This increases the control and speed of the remediation actions to not run into throttling or resource exhausting issues.
- Per-account remediation: The remediations must be grouped per account to maintain a clear log record of actions take and potential errors in a central log space. This will simplify the process of troubleshooting and error-handling in case of failed remediation actions. In case of an error the single log record shows all remediations of one specific account in a time sequential order. It will provide insights if general issues with accounts exists like permission issues or negative side effects between remediations.
- Exception management: There must be a time-bound exception management system in place for the purpose of empowering project teams to request temporary exemptions. This could be needed if the teams expect a negative impact by the remediation because they are dependent on a vulnerable resource configuration. A time-bound exception will be given to allow them more than the 3 weeks for non-production or 3 months for production workloads to remove the dependency.
Overview
We implemented the core functionality based on the given requirements as a project in AWS CodeBuild, a service to build and test code with automatic scaling. The AWS CodeBuild project is in turn initiated by a scheduled rule in AWS EventBridge, a solution for building event-driven applications at scale across AWS, existing systems, or SaaS applications. The remediation logic is defined as Python code within the AWS CodeBuild project.
VW’s DPP security team chose AWS CodeBuild because of its low cost and extended run durations of up to 8 hours. The overall system cost is around 20 dollars per month—covering 1,200 AWS accounts. The scheduled approach provides us the ability to control when the remediation is running and to halt the operation if needed.
Figure 1 Automated remediation framework
Implementation of the auto-remediation functionality is comprised of multiple components shown in figure 1. above:
- Data collector: This function gathers account information and all corresponding findings from AWS Security Hub, a service designed to help customers automate AWS security checks and centralize security alerts.
- Remediation base: This framework component provides basic integration functionality for the remediation, such as assuming sessions for account actions in AWS Identity and Access Management (AWS IAM)—a service designed to securely manage identities and access to AWS services and resources—and interacting with the collected findings from AWS Security Hub. It also aggregates the findings per account, providing a per-account remediation run cycle rather than using a per-finding approach. That means actions are more predictable for account owners, and it also reduces overhead—for instance when assuming AWS IAM roles multiple times within a single remediation session. The remediation base will also take care about logging and monitoring of the remediation module.
- Remediation module: This component contains the function that changes the resource. configuration, which will be run in the account to resolve the identified security issues. The configuration changes made are specifically tailored to the controls they remediate. There can be multiple modules within the framework defined.
Solution
The modular approach adopted by VW helps make it faster and easier to create new remediations. New remediation modules are written in Python. This module only contains the finding title which filters the findings and the changes which remediates the vulnerable configuration of the resource. The remediation base framework handles everything else, including setting up account sessions and gathering security findings from AWS Security Hub. The module just uses the framework which provides this base functionality and lifts off recurring actions to provide needed data. This reduces the time to develop new remediations.
Example module structure in Python:
The solution finally contains the remediation base framework and one or multiple remediation modules. All remediation modules are run sequentially to lower the risk of possible side effects or throttling issues to the AWS service APIs. This approach also facilitates better control and traceability in the event of errors or unexpected behaviors in the remediation process because the actions are sequential logged too.
Figure 2 Automated remediation architecture
It takes only a few steps to run an automated remediation, as shown in figure 2:
- The python code is deployed in the resource account of the DPP security team which contains the AWS CodePipeline, which automates continuous delivery pipelines for fast and reliable updates, and then deploys an AWS CodeBuild job in the Security and Audit account.
- The deployed AWS CodeBuild job runs according to a scheduled AWS EventBridge rule defined by the DPP security team. For example, it runs twice a day at a specific time.
- The triggered AWS CodeBuild job starts the process of data collection to obtain the necessary target account IDs and account tags. This information could be sourced from a variety of places, such as an external custom toolset, the AWS Organization API, or a file stored in Amazon Simple Storage Service (Amazon S3)—object storage built to retrieve any amount of data from anywhere. This depends on the excising landing zone management to provide these data and is not part of the automated remediation solution. As an example, Volkswagen maintains an account discovery tool to provide this information via API centrally.
- Once the account information is obtained, the auto-remediation framework collects and aggregates findings from AWS Security Hub for each account using the
get_findings
API. - The remediation base framework then assumes a pre-defined role in the landing zone account which allows to change the resource configurations and initializes an AWS IAM session with this role into the affected account. Following the principle of least privilege, the assumed role is limited to performing remediation actions only on the affected resources and services.
- Subsequently, the auto-remediation framework provides the initiated IAM session to the remediation modules to run in the CodeBuild job the remediation actions. Additionally, the remediation module gets the AWS Security Hub finding details from the framework filtered by the affected account and finding title. These fining details contains the affected resource information which will be remediated by the module.
- After the remediation module fixes a misconfiguration, the remediation base framework updates the AWS Security Hub finding with a custom meta data field showing when the fix happened and a flag that it was done automatically. The finding will also mark as resolved after the successful remediation to reflect in other tools and the AWS Security Hub dashboard of the affected AWS account.
We use a selective approach to automatically determine which remediation actions will be applied. Key criteria for selecting an AWS Security Hub finding include the following:
- Risk: risk and severity of the misconfiguration
- Volume: number of open findings in the landing zone
- Customer impact: amount of time a project team needs to resolve the finding
Conclusion
VW, together with the support of AWS Professional Services—which helps companies achieve their desired business outcomes with AWS—developed a more robust and efficient process for considering potential new remediation for inclusion in the automated remediation module stack. This process serves as an extension to the Architectural Decision Record (ADR) approach, facilitating the selection of remediation actions that meet stringent criteria among considered candidates. An ADR, which could be created by any security architect, must include data points for key identifiers, potential risks and pseudo-code implementations to support the proposed remediation. The ADR document will be reviewed and approved by the DPP security team to maintain high quality standards while identifying and minimizing potential risks. This will help prevent incorrect decisions which can potentially impact all landing zone accounts.
This more automated approach has improved the overall security posture and helped reduce risks. It has also lowered the operational burden on both VW’s project teams and its security team.
For more details on how AWS can help assist with your automated remediation journey, contact your AWS account representative. AWS Professional Services can help you realize your desired outcomes.