Networking & Content Delivery

Scaling AWS VPN maintenance with tunnel endpoint lifecycle automation

Amazon Web Services (AWS) Site-to-Site VPN is a fully managed service that can create a secure connection between your data center or branch office and your AWS resources using IP Security (IPSec) tunnels. A Site-to-Site VPN connection consists of two VPN tunnels for redundancy. As a managed service, Site-to-Site VPN periodically applies updates to your VPN tunnel endpoints, which might happen during your business hours. These updates are done for various reasons, such as applying general upgrades, retiring underlying hardware, or replacing an unhealthy tunnel endpoint.

During a tunnel endpoint update, AWS applies tunnel replacement to one tunnel at a time in a VPN connection to make sure of continuous connectivity. Although you may experience a brief loss of redundancy during this process, your VPN connection remains operational through the second tunnel. Therefore, configuring both tunnels in your VPN connection is crucial for high availability, making sure of the maximum reliability of your network connectivity.

Introduction

In March 2023, AWS announced Tunnel Endpoint Lifecycle Control, a new capability that provides better visibility and control of your VPN tunnel maintenance updates. The VPN Tunnel Endpoint Lifecycle Control feature enhances the flexibility of Site-to-Site VPN by enabling you to schedule endpoint updates at a time that aligns with your business and operational needs, prior to the service-mandated deadline. You can activate this feature to receive advanced notifications of impending maintenance updates, facilitating proactive planning and minimizing potential service disruptions to your VPN connections. This feature is particularly beneficial for organizations with sensitivity to VPN tunnel state changes or those constrained to supporting only a single active tunnel at a time. It helps mitigate operational challenges associated with periodic maintenance-related VPN tunnel endpoint replacements, offering greater control over your network infrastructure management.

This post guides you through implementing automated maintenance procedures for Site-to-Site VPN connections using the Tunnel Endpoint Lifecycle Control feature. We demonstrate how to establish automated workflows that streamline VPN tunnel updates across single and multiple AWS accounts. We explore how to use AWS services such as Amazon EventBridge, AWS Health, AWS Systems Manager, and Amazon Simple Notification Service (Amazon SNS) to create a comprehensive maintenance automation solution. We also demonstrate how to set up notifications and handle maintenance scheduling in alignment with your operational requirements. Although this post assumes basic familiarity with AWS networking services, we focus on the practical implementation steps and automation techniques rather than foundational concepts.

Enabling and verifying tunnel endpoint lifecycle control

Before we dive in to the automation steps, we must mention that endpoint lifecycle control can be enabled on both existing and new VPN connections. This can be done using either the AWS Management Console or AWS Command Line Interface (AWS CLI).

Using AWS Management Console

For detailed steps on how to enable Tunnel Endpoint Lifecycle Control, you must verify if Tunnel Endpoint Lifecycle Control is enabled, check for available updates, and accept a maintenance update through the Console. For more information, refer to the Site-to-Site VPN Tunnel Endpoint Lifecycle Control documentation.

Figure 1 and Figure 2 are console screenshots of the VPN connection when the Tunnel Endpoint Lifecycle Control feature is enabled. When you enable this feature on an existing VPN connection, it typically triggers an immediate tunnel endpoint replacement. However, if you prefer to activate the feature without initiating an immediate replacement, then you can use the skip tunnel replacement option.

Figure 1: Tunnel Endpoint Lifecycle Control feature

Figure 1: Tunnel Endpoint Lifecycle Control feature

Figure 2: Tunnel Endpoint Replacement

Figure 2: Tunnel Endpoint Replacement

After enabling the feature, you can verify the status in the VPN connection details. Figure 3 shows the Tunnel Endpoint Lifecycle Control option set to On:

 Figure 3: Tunnel Endpoint Lifecycle Control status

Figure 3: Tunnel Endpoint Lifecycle Control status

Using AWS CLI

To enable Tunnel Endpoint Lifecycle Control for a new VPN connection, use the following:

aws ec2 create-vpn-connection \
--type ipsec.1 \
--customer-gateway-id cgw-001122334455aabbc \
--vpn-gateway-id vgw-1a1a1a1a1a1a2b2b2 \
--options '{"TunnelOptions": [{"EnableTunnelLifecycleControl": true}]}'

To enable Tunnel Endpoint Lifecycle Control for an existing VPN connection, use the following:

aws ec2 modify-vpn-tunnel-options \
--vpn-connection-id vpn-12345678901234567 \
--vpn-tunnel-outside-ip-address 203.0.113.17 \
--tunnel-options '{"EnableTunnelLifecycleControl": true}'

After enabling Tunnel Endpoint Lifecycle Control, you can check for available maintenance updates. This allows you to schedule the update at your convenience. Furthermore, checking for an available Site-to-Site VPN tunnel update does not automatically download and deploy the update. To check for available updates, use the following:

aws ec2 get-vpn-tunnel-replacement-status \
--vpn-connection-id vpn-12345678901234567 \
--vpn-tunnel-outside-ip-address 203.0.113.17

Figure 4 shows a screenshot of a VPN connection where a maintenance is Available.

Figure 4: Available Maintenance

Figure 4: Available Maintenance

To apply the available maintenance, use the following:

aws ec2 replace-vpn-tunnel \
--vpn-connection-id vpn-12345678901234567 \
--vpn-tunnel-outside-ip-address 203.0.113.17 \
--apply-pending-maintenance

Automating VPN tunnel maintenance notifications and endpoint updates

Along with the console and AWS CLI options, AWS Health provides notifications for VPN tunnel updates. Through a personalized AWS Health Dashboard, you can receive VPN tunnel update notifications at both the account and organization levels. Figure 5 shows an example VPN tunnel update notification. These notifications include comprehensive details about VPN tunnel updates, such as affected AWS Regions, resources, deadlines, and implementation instructions. Automatic updates are applied if they are not manually executed by the specified deadline, as shown in Figure 6 and Figure 7. The notifications offer a proactive way to stay informed about pending maintenance.

Figure 5: VPN Tunnel Update Notification

Figure 5: VPN Tunnel Update Notification

Figure 6: Detailed Notification

Figure 6: Detailed Notification

Figure 7: Affected Resources

Figure 7: Affected Resources

Although AWS Health notifications provide visibility into pending maintenance, organizations can further streamline their VPN maintenance process by implementing an automated solution. This solution uses several AWS services to create an end-to-end automated workflow for managing VPN tunnel maintenance across multiple accounts.

Solution overview and components

The automated solution uses the following AWS services:

  • AWS CloudFormation for creating solution stack and cross-account roles stack set
  • AWS Health and Amazon EventBridge for VPN tunnel maintenance alerts
  • Amazon SNS for sending maintenance and update notifications
  • Systems Manager and AWS Lambda for maintenance automation

The workflow orchestrates the following sequence of events:

  1. AWS Health detects and notifies about upcoming VPN tunnel maintenance.
  2. Event Bridge uses the AWS Health event (AWS_VPN_TUNNEL_UPDATE_AVAILABLE) to send maintenance alerts to the SNS target.
  3. The Event Notification is delivered to stakeholders through Amazon SNS.
  4. Users create Systems Manager Maintenance Windows per their convenient time and register automation task to them.
  5. Systems Manager runs tunnel maintenance through Systems Manager Document Automation during the scheduled window by triggering the associated Lambda function.
  6. The Lambda function assumes the role in the member account and performs the tunnel endpoint replacement.
  7. The SNS notification is triggered by the Lambda function to send the completion notification to user.
  8. When maintenance is complete, the Systems Manager Maintenance Window displays the status sent by Lambda.
Figure 8: Solution flow diagram

Figure 8: Solution flow diagram

Scenario 1: For accounts within AWS Organizations (such as AWS Landing Zone setups)

When using AWS Organizations, you can centralize VPN tunnel maintenance through a single delegated account, as shown in Figure 9. This account does the following:

A. Receive all maintenance notifications in its AWS Health Dashboard

B. Manage maintenance schedules for member accounts through Systems Manager

Figure 9: AWS Organization set up

Figure 9: AWS Organizations set up

This setup needs two CloudFormation templates:

1. Delegated account template

  • Deploys in your Delegated Admin account (such as a Networking or Shared Services account in an AWS Landing Zone)
  • Sets up central management capabilities and serves as the foundation for the automated VPN tunnel maintenance process

2. Member account template (deployed as StackSet)

  • Deploys across all member accounts in the organization
  • Creates the necessary AWS Identity and Access Management (IAM) role allowing the delegated account to perform VPN maintenance
  • Can be deployed as a StackSet from the central account to automatically create IAM roles across all member accounts.

CloudFormation stack template for delegated account

The CloudFormation template provisions the following resources in the delegated admin account:

1. SNS resource:

– Creates an SNS topic named ”vpn-tunnel-replacement-notifications”

– Sets up an email subscription for notifications (users need to confirm email subscription)

– Configures SNS topic policy allowing EventBridge to publish messages

2. EventBridge rule:

– Creates rule ”vpn-health-events-rule”, which monitors AWS Health events specifically for VPN service events (tunnel updates available and redundancy loss)

– Targets the SNS topic for notifications

3. IAM role:

– Creates IAM role for Systems Manager with permissions to:

  • Execute automations
  • Send commands
  • Invoke Lambda functions

– Creates IAM role for Lambda with permissions to:

  • Create Amazon CloudWatch logs
  • Assume roles
  • Publish to Amazon SNS
  • Get caller identity

4. Systems Manager automation document:

– Creates document ”VpnTunnelReplacementDocument”

– Defines parameters for:

  • Member Account ID
  • VPN Connection ID
  • VPN Tunnel Outside IP Address

– Configures automation to invoke Lambda function

5. Lambda function:

– Creates ”ReplaceVpnTunnelFunction” with Python 3.9 runtime

– Implements functionality to:

  • Assume role in member account
  • Replace VPN tunnel endpoints
  • Send success/failure notifications through Amazon SNS

– Sets five minute timeout

6. Outputs of the CloudFormation:

– Lambda role Amazon Resource Name (ARN)

– SNS topic ARN

Figure 10: Output for CloudFormation stack in delegated account

Figure 10: Output for CloudFormation stack in delegated account

The following shows the CloudFormation stack template for delegated account:

AWSTemplateFormatVersion: '2010-09-09'
Description: "Creates resources for VPN tunnel replacement automation, including IAM roles, SNS topic, EventBridge rule, SSM document, and Lambda function."
Parameters:
  EmailSubscription:
    Type: String
    Description: "Email address to receive notifications"
    Default: ""
Resources:
  VpnTunnelReplacementTopic:
    Type: 'AWS::SNS::Topic'
    Properties:
      TopicName: 'vpn-tunnel-replacement-notifications'
      DisplayName: 'VPN Tunnel Replacement Notifications'
  VpnTunnelReplacementTopicSubscription:
    Type: 'AWS::SNS::Subscription'
    Properties:
      TopicArn: !Ref VpnTunnelReplacementTopic
      Protocol: 'email'
      Endpoint: !Ref EmailSubscription
  VpnTunnelReplacementTopicPolicy:
    Type: 'AWS::SNS::TopicPolicy'
    Properties:
      Topics:
        - !Ref VpnTunnelReplacementTopic
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Action: 'sns:Publish'
            Resource: !Ref VpnTunnelReplacementTopic
  VpnHealthEventsRule:
    Type: 'AWS::Events::Rule'
    Properties:
      Name: 'vpn-health-events-rule'
      Description: 'Rule to monitor VPN Health events for tunnel updates and redundancy loss'
      State: 'ENABLED'
      EventPattern:
        source: 
          - "aws.health"
        detail-type:
          - "AWS Health Event"
        detail:
          service:
            - "VPN"
          eventTypeCategory:
            - "accountNotification"
          eventTypeCode:
            - "AWS_VPN_TUNNEL_UPDATE_AVAILABLE"
            - "AWS_VPN_REDUNDANCY_LOSS"
      Targets:
        - Arn: !Ref VpnTunnelReplacementTopic
          Id: 'VpnHealthNotificationTarget'
  IAMRoleForSSM:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ssm.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: SSMDocumentPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - ssm:StartAutomationExecution
                  - ssm:DescribeAutomationExecutions
                  - ssm:GetAutomationExecution
                  - ssm:SendCommand
                Resource: 
                  - !Sub 'arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:document/VpnTunnelReplacementDocument'
                  - !Sub 'arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:automation-execution/*'
        - PolicyName: LambdaInvokePolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - lambda:InvokeFunction
                Resource: !GetAtt LambdaFunction.Arn

  IAMRoleForLambda:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: LambdaAssumeRolePolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - sts:AssumeRole
                Resource: !Sub 'arn:aws:iam::*:role/vpn-endpoint-replacement-role'
        - PolicyName: LambdaSNSPublishPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - sns:Publish
                Resource: !Ref VpnTunnelReplacementTopic
              - Effect: Allow
                Action:
                  - sts:GetCallerIdentity
                Resource: '*'

  SSMDocument:
    Type: AWS::SSM::Document
    Properties:
      Name: "VpnTunnelReplacementDocument"
      DocumentType: Automation
      Content:
        schemaVersion: '0.3'
        description: Replace VPN Tunnel in a child account from a delegated account
        assumeRole: !GetAtt IAMRoleForSSM.Arn
        parameters:
          MemberAccountID:
            type: String
            description: The ID of account where tunnel endpoints will be replaced.
          VpnConnectionId:
            type: String
            description: The ID of the VPN connection in the child account.
          VpnTunnelOutsideIpAddress:
            type: String
            description: The outside IP address of the VPN tunnel.
        mainSteps:
          - name: InvokeLambdaFunction
            action: aws:invokeLambdaFunction
            isEnd: true
            inputs:
              FunctionName: !Ref LambdaFunction
              Payload: |-
                {
                  "MemberAccountID": "{{MemberAccountID}}",
                  "VpnConnectionId": "{{VpnConnectionId}}",
                  "VpnTunnelOutsideIpAddress": "{{VpnTunnelOutsideIpAddress}}"
                }

  LambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: ReplaceVpnTunnelFunction
      Handler: index.lambda_handler
      Runtime: python3.9
      Timeout: 300
      Environment:
        Variables:
          SNS_TOPIC_ARN: !Ref VpnTunnelReplacementTopic
      Code:
        ZipFile: |
          import boto3
          import json
          import os
          from datetime import datetime

          def get_aws_account_id():
              sts_client = boto3.client('sts')
              return sts_client.get_caller_identity()['Account']

          def send_sns_notification(message, subject):
              sns_client = boto3.client('sns')
              try:
                  sns_client.publish(
                      TopicArn=os.environ['SNS_TOPIC_ARN'],
                      Message=message,
                      Subject=subject
                  )
              except Exception as e:
                  print(f"Failed to send SNS notification: {str(e)}")

          def lambda_handler(event, context):
              # Get AWS account information
              delegated_account_id = get_aws_account_id()
              aws_region = context.invoked_function_arn.split(":")[3]
              timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

              # Initialize STS client
              sts_client = boto3.client('sts')

              try:
                  # Assume the role in the child account
                  assumed_role = sts_client.assume_role(
                       RoleArn=f"arn:aws:iam::{event['MemberAccountID']}:role/vpn-endpoint-replacement-role",
                       RoleSessionName="VpnTunnelReplacementSession"
                  )

                  # Get member account ID from the role ARN
                  member_account_id = event['MemberAccountID']

                  # Extract temporary credentials from the assumed role
                  credentials = assumed_role['Credentials']

                  # Create EC2 client using the temporary credentials from the assumed role
                  ec2_client = boto3.client(
                      'ec2',
                      aws_access_key_id=credentials['AccessKeyId'],
                      aws_secret_access_key=credentials['SecretAccessKey'],
                      aws_session_token=credentials['SessionToken']
                  )

                  # Replace the VPN tunnel using the temporary credentials
                  response = ec2_client.replace_vpn_tunnel(
                      VpnConnectionId=event['VpnConnectionId'],
                      VpnTunnelOutsideIpAddress=event['VpnTunnelOutsideIpAddress'],
                      ApplyPendingMaintenance=True
                  )
                  
                  # Prepare success message
                  success_message = {
                      "status": "SUCCESS",
                      "timestamp": timestamp,
                      "execution_details": {
                          "delegated_account_id": delegated_account_id,
                          "member_account_id": member_account_id,
                          "aws_region": aws_region
                      },
                      "vpn_details": {
                          "vpn_connection_id": event['VpnConnectionId'],
                          "tunnel_address": event['VpnTunnelOutsideIpAddress']
                      },
                      "operation_response": {
                          "status": response['ResponseMetadata']['HTTPStatusCode'],
                          "request_id": response['ResponseMetadata']['RequestId']
                      }
                  }
                  
                  # Send success notification
                  send_sns_notification(
                      json.dumps(success_message, indent=2),
                      f"VPN Tunnel Replacement Successful - {event['VpnConnectionId']}"
                  )
                  
                  return {
                      "statusCode": 200,
                      "body": json.dumps(response)
                  }
                  
              except Exception as e:
                  # Prepare error message
                  error_message = {
                      "status": "FAILED",
                      "timestamp": timestamp,
                      "execution_details": {
                          "delegated_account_id": delegated_account_id,
                          "member_account_id": member_account_id,
                          "aws_region": aws_region
                      },
                      "vpn_details": {
                          "vpn_connection_id": event['VpnConnectionId'],
                          "tunnel_address": event['VpnTunnelOutsideIpAddress']
                      },
                      "error_details": {
                          "error_message": str(e),
                          "error_type": type(e).__name__
                      }
                  }
                  
                  # Send failure notification
                  send_sns_notification(
                      json.dumps(error_message, indent=2),
                      f"VPN Tunnel Replacement Failed - {event['VpnConnectionId']}"
                  )
                  
                  return {
                      "statusCode": 500,
                      "body": str(e)
                  }
      Role: !GetAtt IAMRoleForLambda.Arn

Outputs:
  LambdaRoleArn:
    Description: The ARN of the IAM Role for the Lambda function.
    Value: !GetAtt IAMRoleForLambda.Arn
  SNSTopicArn:
    Description: "The ARN of the SNS Topic for notifications"
    Value: !Ref VpnTunnelReplacementTopic

CloudFormation StackSet template for cross-account access

This StackSet template creates the necessary IAM role and permissions in each linked account. Therefore, the Lambda function in the delegated admin account can assume this role and perform VPN tunnel replacements in the member accounts.

1. IAM role:

– Creates an IAM role named “vpn-endpoint-replacement-role”

2. Role trust relationship:

– Configures the trust relationship to allow assumption by the specified Lambda function

3. IAM policy:

– Attaches an inline policy named “VPNReplacementPolicy” to the role. This policy grants permissions to:

  • Describe VPN connections
  • Replace VPN tunnels

4. Parameter:

– LambdaARNToAssumeIAMRole: Accepts the ARN(s) of the Lambda function(s) that assumes this role

The following are the useful snapshots of the Stacksets deployment and Systems Manager maintenance window configuration.

Step 1: Organization StackSet input details

Figure 11: Organization StackSet details

Figure 11: Organization StackSet details

Step 2: Set the deployment options for StackSet.

Figure 12: Organization StackSet deployment target

Figure 12: Organization StackSet deployment target

Step 3: StackSet instance and operations status for successful deployment.

Figure 13: StackSet instances status

Figure 13: StackSet instances status

Figure 14: StackSet operation status

Figure 14: StackSet operation status

Step 4: Systems Manager maintenance window creation.

Figure 15: Systems Manager maintenance window creation

Figure 15: Systems Manager maintenance window creation

Step 5: Registering Systems Manager automation task and target with maintenance window.

Figure 16: Systems Manager task creation

Figure 16: Systems Manager task creation

Figure 17: Systems Manager task Parameters

Figure 17: Systems Manager task Parameters

Step 6: Upon running the Systems Manager maintenance window, the user gets an email notification through Amazon SNS.

Figure 18: Notification sent to user

Figure 18: Notification sent to user

The following shows the CloudFormation StackSet template for cross-account access:

AWSTemplateFormatVersion: "2010-09-09"
Description: "Creates an IAM role with permissions to describe VPN connections and replace VPN tunnels. This role can be assumed by a specified Lambda function to perform VPN tunnel replacement operations."
Parameters:
  LambdaARNToAssumeIAMRole:
    Description: "The Lambda IAM role that would be assuming the VPN Tunnel Replacement Role. Refer to delgated account stack output for the ARN"
    Type: String
Resources:
  MemberAccountRole:
    Type: "AWS::IAM::Role"
    Properties:
      RoleName: "vpn-endpoint-replacement-role"
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
        - Effect: "Allow"
          Principal:
            AWS:
              Fn::Split:
                - ","
                - !Ref LambdaARNToAssumeIAMRole
          Action:
          - "sts:AssumeRole"
      Path: "/"
      Policies:
        - PolicyName: "VPNReplacementPolicy"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: "Allow"
                Action:
                  - "ec2:DescribeVpnConnections"
                  - "ec2:ReplaceVpnTunnel"
                Resource: !Sub "arn:aws:ec2:${AWS::Region}:${AWS::AccountId}:vpn-connection/*"

Scenario 2: For standalone AWS accounts

This template provides the same VPN tunnel maintenance automation capabilities but is designed for use within a single AWS account rather than across multiple accounts. This CloudFormation template provisions in a standalone account and creates the following resources:

1. SNS resources:

– Creates an SNS topic named ”vpn-tunnel-replacement-notifications”

– Sets up email subscription for notifications

– Configures SNS topic policy allowing EventBridge to publish messages

2. EventBridge rule:

– Creates rule ”vpn-health-events-rule”

– Monitors AWS Health events specifically for VPN service events (tunnel updates and redundancy loss)

– Targets the SNS topic for notifications

3. IAM role for Lambda:

– Attaches Lambda Basic Execution Role

– Includes custom policy allowing:

  • VPN tunnel replacement
  • VPN connection description
  • Amazon SNS publishing

4. Systems Manager automation document:

– Creates automation document for VPN tunnel replacement

– Defines parameters:

  • VPN Connection ID
  • VPN Tunnel Outside IP Address

5. Lambda function:

– Creates ”ReplaceVpnTunnelFunction” with Python 3.9 runtime

– Implements functionality to:

  • Replace VPN tunnel endpoints
  • Send success/failure notifications through Amazon SNS

– Sets five minute timeout

6. Outputs of CloudFormation:

  • Lambda function name
  • Lambda role ARN
  • Systems Manager document name
  • SNS topic ARN

CloudFormation template for standalone AWS accounts

Use the following CloudFormation template in the case of a standalone AWS account automation solution.

AWSTemplateFormatVersion: "2010-09-09"
Description: "Creates resources for VPN tunnel replacement automation, including IAM roles, SNS topic, EventBridge rule, SSM document, and Lambda function."
Parameters:
  EmailSubscription:
    Type: String
    Description: "Email address to receive notifications"
    Default: ""
Resources:
  VpnTunnelReplacementTopic:
    Type: 'AWS::SNS::Topic'
    Properties:
      TopicName: 'vpn-tunnel-replacement-notifications'
      DisplayName: 'VPN Tunnel Replacement Notifications'
  VpnTunnelReplacementTopicSubscription:
    Type: 'AWS::SNS::Subscription'
    Properties:
      TopicArn: !Ref VpnTunnelReplacementTopic
      Protocol: 'email'
      Endpoint: !Ref EmailSubscription
  VpnTunnelReplacementTopicPolicy:
    Type: 'AWS::SNS::TopicPolicy'
    Properties:
      Topics:
        - !Ref VpnTunnelReplacementTopic
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Action: 'sns:Publish'
            Resource: !Ref VpnTunnelReplacementTopic
  VpnHealthEventsRule:
    Type: 'AWS::Events::Rule'
    Properties:
      Name: 'vpn-health-events-rule'
      Description: 'Rule to monitor VPN Health events for tunnel updates and redundancy loss'
      State: 'ENABLED'
      EventPattern:
        source: 
          - "aws.health"
        detail-type:
          - "AWS Health Event"
        detail:
          service:
            - "VPN"
          eventTypeCategory:
            - "accountNotification"
          eventTypeCode:
            - "AWS_VPN_TUNNEL_UPDATE_AVAILABLE"
            - "AWS_VPN_REDUNDANCY_LOSS"
      Targets:
        - Arn: !Ref VpnTunnelReplacementTopic
          Id: 'VpnHealthNotificationTarget'
  IAMRoleForLambda:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
        - Effect: "Allow"
          Principal:
            Service: "lambda.amazonaws.com"
          Action: "sts:AssumeRole"
      ManagedPolicyArns:
        - "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
      Policies:
        - PolicyName: "VPNManagementPolicy"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
            - Effect: "Allow"
              Action:
                - "ec2:ReplaceVpnTunnel"
                - "ec2:DescribeVpnConnections"
              Resource: !Sub "arn:aws:ec2:${AWS::Region}:${AWS::AccountId}:vpn-connection/*"
            - Effect: "Allow"
              Action:
                - "sns:Publish"
              Resource: !Ref VpnTunnelReplacementTopic
            - Effect: "Allow"
              Action:
                - "sts:GetCallerIdentity"
              Resource: "*"
  SSMDocument:
    Type: "AWS::SSM::Document"
    Properties:
      DocumentType: "Automation"
      Content:
        schemaVersion: "0.3"
        description: "Replace VPN Tunnel"
        parameters:
          VpnConnectionId:
            type: "String"
            description: "The ID of the VPN connection."
          VpnTunnelOutsideIpAddress:
            type: "String"
            description: "The outside IP address of the VPN tunnel."
        mainSteps:
          - name: "InvokeLambdaFunction"
            action: "aws:invokeLambdaFunction"
            isEnd: true
            inputs:
              FunctionName: !Ref LambdaFunction
              Payload: >-
                {
                  "VpnConnectionId": "{{VpnConnectionId}}",
                  "VpnTunnelOutsideIpAddress": "{{VpnTunnelOutsideIpAddress}}"
                }

  LambdaFunction:
    Type: "AWS::Lambda::Function"
    Properties:
      FunctionName: "ReplaceVpnTunnelFunction"
      Handler: "index.lambda_handler"
      Runtime: "python3.9"
      Timeout: 300
      Environment:
        Variables:
          SNS_TOPIC_ARN: !Ref VpnTunnelReplacementTopic
      Code:
        ZipFile: |
          import boto3
          import json
          import os
          from datetime import datetime

          def get_aws_account_id():
              sts_client = boto3.client('sts')
              return sts_client.get_caller_identity()['Account']

          def send_sns_notification(message, subject):
              sns_client = boto3.client('sns')
              try:
                  sns_client.publish(
                      TopicArn=os.environ['SNS_TOPIC_ARN'],
                      Message=message,
                      Subject=subject
                  )
              except Exception as e:
                  print(f"Failed to send SNS notification: {str(e)}")

          def lambda_handler(event, context):
              # Get AWS account information
              account_id = get_aws_account_id()
              aws_region = context.invoked_function_arn.split(":")[3]
              timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

              # Create EC2 client
              ec2_client = boto3.client('ec2')

              try:
                  # Replace the VPN tunnel
                  response = ec2_client.replace_vpn_tunnel(
                      VpnConnectionId=event['VpnConnectionId'],
                      VpnTunnelOutsideIpAddress=event['VpnTunnelOutsideIpAddress'],
                      ApplyPendingMaintenance=True
                  )
                  
                  # Prepare success message
                  success_message = {
                      "status": "SUCCESS",
                      "timestamp": timestamp,
                      "execution_details": {
                          "account_id": account_id,
                          "aws_region": aws_region
                      },
                      "vpn_details": {
                          "vpn_connection_id": event['VpnConnectionId'],
                          "tunnel_address": event['VpnTunnelOutsideIpAddress']
                      },
                      "operation_response": {
                          "status": response['ResponseMetadata']['HTTPStatusCode'],
                          "request_id": response['ResponseMetadata']['RequestId']
                      }
                  }
                  
                  # Send success notification
                  send_sns_notification(
                      json.dumps(success_message, indent=2),
                      f"VPN Tunnel Replacement Successful - {event['VpnConnectionId']}"
                  )
                  
                  return {
                      "statusCode": 200,
                      "body": json.dumps(response)
                  }
              except Exception as e:
                  # Prepare error message
                  error_message = {
                      "status": "FAILED",
                      "timestamp": timestamp,
                      "execution_details": {
                          "account_id": account_id,
                          "aws_region": aws_region
                      },
                      "vpn_details": {
                          "vpn_connection_id": event['VpnConnectionId'],
                          "tunnel_address": event['VpnTunnelOutsideIpAddress']
                      },
                      "error_details": {
                          "error_message": str(e),
                          "error_type": type(e).__name__
                      }
                  }
                  
                  # Send failure notification
                  send_sns_notification(
                      json.dumps(error_message, indent=2),
                      f"VPN Tunnel Replacement Failed - {event['VpnConnectionId']}"
                  )
                  
                  return {
                      "statusCode": 500,
                      "body": str(e)
                  }
      Role: !GetAtt IAMRoleForLambda.Arn
Outputs:
  LambdaFunctionName:
    Description: "The name of the Lambda function"
    Value: !Ref LambdaFunction
  LambdaRoleArn:
    Description: "The ARN of the IAM Role for the Lambda function"
    Value: !GetAtt IAMRoleForLambda.Arn
  SSMDocumentName:
    Description: "The name of the SSM Document"
    Value: !Ref SSMDocument
  SNSTopicArn:
    Description: "The ARN of the SNS Topic for notifications"
    Value: !Ref VpnTunnelReplacementTopic

Benefits of the automated approach with VPN endpoint lifecycle control

  • Reduced operational overhead: Eliminates manual monitoring and running of VPN maintenance tasks
  • Consistent implementation: Makes sure of standardized maintenance procedures across all VPN connections
  • Flexible scheduling: Allows teams to schedule maintenance during preferred maintenance windows
  • Cross-account management: Enables centralized management of VPN maintenance across multiple AWS accounts
  • Comprehensive audit trail: Maintains detailed logs through AWS CloudTrail, Systems Manager, and AWS Health

Best practices for implementation

Documentation: Maintain updated runbooks and documentation

Testing: Always test the automation workflow in a non-production environment first

Monitoring:

  • Implement comprehensive monitoring using CloudWatch
  • Configure Dead Letter Queues (DLQs) for Lambda functions to track and analyze failed executions
  • Set up alarms for DLQ message arrivals to quickly identify failures

Notifications: Configure detailed notifications for both successful and failed maintenance operations

Security and access control:

  • Implement least-privilege access for all roles and permission
  • Encrypt SNS topics using AWS Key Management Service (AWS KMS) to protect sensitive message content
  • Enable encryption for Lambda environment variables using KMS keys

This solution is AWS Region-specific. If you have VPN connections across multiple AWS Regions, then deploy the same CloudFormation templates in each Region where you need to manage VPN maintenance

Cleaning up

Follow these steps to clean up your resources:

1. Delete StackSet instances (for multi-account setup)

  • Open the CloudFormation console
  • Go to the StackSets section
  • Choose the StackSet that you created for the linked accounts
  • Choose Delete stacks from StackSet
  • Choose all accounts and AWS Regions where you deployed the StackSet instances
  • Confirm and wait for all instances to be deleted

2. Delete the StackSet (for multi-account setup)

  • After all instances are deleted, choose the StackSet again
  • Choose Delete StackSet
  • Confirm the deletion

3. Delete CloudFormation Stack in delegated admin account

  • Open the CloudFormation console
  • Choose the stack that you created for the delegated admin solution
  • Choose Delete and confirm the deletion
  • Wait for the stack deletion to complete (this removes all resources created by the template)

4. Delete CloudFormation stack in standalone account (if applicable)

  • Open the CloudFormation console
  • Choose the stack that you created for the standalone account
  • Choose Delete and confirm the deletion
  • Wait for the stack deletion to complete

5. Verify resource deletion

After stack deletions are complete, verify that the associated resources have been removed:

  • Check Lambda functions
  • Verify SNS topics are deleted
  • Make sure that IAM roles are removed
  • Confirm EventBridge rules are deleted
  • Check that Systems Manager documents are removed

6. Remove any manual configurations

  • If you made any manual adjustments or created other resources, then make sure that these are also cleaned up.

The deletion process may take several minutes to complete. Always double-check your AWS account to make sure that all related resources have been properly removed to avoid unexpected charges.

About the authors

Avanish Yadav is a Senior Networking Solutions Architect at AWS. With a passion for networking technologies, he enjoys innovating and helping users solve complex technical challenges by creating secure, scalable cloud architectures. When he’s not collaborating with users to provide expert solutions to their needs, he can often be found playing cricket outside of work. LinkedIn: /avanish-yadav-93b8a947

Tejas Majamudar is a Senior Technical Account Manager at AWS, where he partners with users to achieve operational excellence and optimize their cloud infrastructure. As a trusted advisor, Tejas helps organizations implement efficient risk management strategies and cost optimization initiatives, enabling them to maximize the value of their AWS investments. LinkedIn: /contacttejasm

Utkarsh Srivastav is a Cloud Architect at AWS, specializing in DevOps and AI/ML implementations. He works with users to enhance their cloud adoption through effective DevOps practices and AI/ML solutions. With experience in cloud architecture, containerization, automation, and MLOps, Utkarsh helps organizations build practical solutions on AWS. He focuses on aligning technical solutions with business strategies to help users optimize their development processes and drive tangible results. LinkedIn: /utk231