AWS Cloud Operations & Migrations Blog

How Capgemini uses AWS Systems Manager Automation runbooks to generate reports for AWS Backup activity

Centralizing and automating data protection helps you support your business continuity and regulatory compliance goals. Backup compliance includes the ability to define and enforce backup policies to encrypt your backups, protect them from manual deletion, prevent changes to your backup lifecycle settings, and audit and report on backup activity from a centralized console.

A common ask from customers to enhance their cloud security posture in AWS is to aggregate, organize, and prioritize security alerts (also called findings) across multiple AWS services and partner solutions while performing continuous compliance checks and identifying risks associated with their AWS workloads. Many customers leverage Managed Solutions Providers to manage their AWS accounts and are looking for AWS-native solutions to solve these business problems.

As a certified AWS Managed Services Provider (MSP) and an AWS Premier Consulting Partner with seven AWS Competencies, and AWS Well-Architected Partner Program, Capgemini has been proven to create solutions for challenges to fit the unique and evolving needs of customers.

Cloud Operation Services (COS) from CapGemini is a Managed Service offer for AWS Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solutions.

AWS Backup enables you to centralize and automate data protection across AWS services and hybrid workloads. AWS Backup offers a cost-effective, fully-managed, and policy-based service that further simplifies data protection at scale. AWS Backup also helps you support your regulatory compliance or business policies for data protection.

AWS Systems Manager is an operations hub for AWS which provides a universal user interface so that users can track and resolve operational issues across their AWS applications and resources from a central location. With Systems Manager, you can automate operational tasks for servers running in a hybrid environment via a single interface. A hybrid environment includes on-premises servers and virtual machines (VMs) that have been configured for use with Systems Manager, including VMs in other cloud environments. You can also group resources by application, view operational data for monitoring and troubleshooting, implement pre-approved change work flows, and audit operational changes for your groups of resources. Systems Manager simplifies resource and application management, shortens the time to detect and resolve operational problems, and makes it easier to operate and manage your infrastructure at scale.

AWS Systems Manager Agent (SSM Agent) is Amazon software that can be installed and configured on an Amazon Elastic Compute Cloud (Amazon EC2) instance, an on-premises server, or a VM. SSM Agent makes it possible for Systems Manager to update, manage, and configure these resources. The agent processes requests from the Systems Manager service in the AWS Cloud, and then runs them as specified in the request. SSM Agent then sends status and execution information back to the Systems Manager service by using the Amazon Message Delivery Service (service prefix: ec2messages).

An AWS Systems Manager Automation Runbook (SSM document) defines the actions that Systems Manager performs on your managed instances and other AWS resources when an automation runs. Automation is a capability of Systems Manager. A runbook contains one or more steps that run in sequential order. Each step is built around a single action. Output from one step can be used as the input in a later step.

Amazon EventBridge is a serverless event bus that makes it easier to build event-driven applications at scale using events generated from your applications, integrated Software-as-a-Service (SaaS) applications, and AWS services.

Amazon Simple Notification Service (Amazon SNS) is a fully-managed messaging service for both application-to-application (A2A) and application-to-person (A2P) communication.

Prerequisites

The following prerequisites are required before continuing onward:

  1. All Amazon EC2 instances must be tagged with a specific tag key and value determined by the customer.
  2. Windows EC2 instances must have “user data execution” enabled.
  3. (Optional) Systems Manager VPC endpoints for managing private Amazon EC2 instances without internet access.
  4. Amazon EC2 instances must be registered with Systems Manager and are all in the same AWS Region for backups.
  5. At least one resource supported by AWS Backup.

How Capgemini made it work

AWS Backup fulfils the backup and restore function. Reporting and monitoring functions are triggered by the SNOW-API (Service Now API),  AWS Lambda and the Backup Report Automation runbook.

How Capgemini uses AWS Systems Manager Automation runbooks to generate reports for AWS Backup activity

Figure 1. How Capgemini uses AWS Systems Manager Automation runbooks to generate reports for AWS Backup activity

Solution overview

Backup vault and plan

In AWS Backup, a backup plan is a policy expression that defines when and how you want to back up your AWS resources, such as Amazon DynamoDB tables or Amazon Elastic File System (Amazon EFS) file systems. You can assign resources to backup plans, and AWS Backup automatically backs up and retains backups for those resources according to the backup plan. You can create multiple backup plans if you have workloads with different backup requirements.

A Backup plan in AWS Backup is responsible for setting the backup schedule, start time window, and backup retention period.

The schedule is set as a cron expression in UTC timezone. For example, “cron(03 4 * * ? *) creates a backup at 4:03 AM UTC everyday”. Reference for AWS cron expressions can be found here.

The start time window is defined in minutes and defines the maximum time within which the creation of backups should be started. When choosing this value, take this into consideration (“start time window” +” time needed for backup creation” should not be longer than the planned maintenance window).

Retention time is defined in days after the specified date backups will be removed. This is parameterized and can be changed to align with client requirements. By default, the solution will deploy daily, weekly, and monthly backup rules, with the defined time and retention periods. This is parametrized and can be changed on deployment to suit client specifications.

AWS Backup vault

An AWS Backup vault  is a container responsible for storing backup snapshots and managing access to them. Upon deployment, the vault is encrypted with an AWS Key Management Service (AWS KMS) key. This can be switched to a customer-managed key on request, prior to vault creation. This KMS key can only be specified on vault creation and not modified afterward.

A Backup Vault is created as part of the solution.

Backup monitoring and alerting

By default, the following Backup, Copy, and Restore events will be monitored:

  • Backup job failure
  • Backup job expired prior to completion
  • Copy Snapshot failure
  • Restore Job failure

If triggered, then they will be logged as tickets into MSP (Managed Service Providers) ServiceNow to be actioned by support users.

These event types are monitored by the COS-AWS-Backup-Monitoring EventBridge rule. This will trigger whenever the events occur in the region in which the rule is deployed. Any amendments to these API events must be done to the filter on the EventBridge rule itself.

Alerts are sent to the ServiceNow SNS topic and forwarded to the COS-Lambda-SNOW-Listener Lambda function which will send them to MSP ServiceNow instance to be triaged and remediated by support personnel.

Backup reporting

Reports are generated daily, weekly, and monthly. These are initiated by EventBridge rules set to trigger at respective time schedules. Once activated, the rules will trigger the linked Systems Manager Automation runbook, which will interact with appropriately tagged AWS Backup Vaults. The EventBridge rules will query and collate backup data, generate the report, and store it in an S3 bucket. Lastly the COS-Lambda-Backup-Report-Generator Lambda Function will send a URL to an email address, so that the recipient can access the report. This URL is backed by an AWS Identity and Access Management (IAM) key, which honors the expiry time limit set on it. A Security Token Service (STS) backed pre-signed URL.

SSM Automation Runbook – pre + post backup scripts

The ability to run pre-backup and post-backup actions has been deployed as an extra function of the solution. This is accomplished using SSM Automation Runbook, which can be used by support staff to amend and change to suit bespoke client requirements.

Backup and restoration testing

To make sure that backup, restore, and alerting functions are operating properly, tests are conducted manually on a temporary Amazon EC2 instance. A manual backup job is submitted, and then a restoration of that job is submitted afterward. To test alerting, a manual backup job is cancelled during its run, which will cause the EventBridge rule to trigger, and an alert to be raised to the Capgemini ITSM (IT service management).

Summary

Capgemini now offers a solution for you to manage the backups of your Amazon EC2 instances with end-to-end automation, monitoring of backups, and alerting if issues are found. To learn more about how Capgemini can assist with your business challenges related to management and governance, and to learn more about Capgemini visit Capgemini Cloud Platform. To learn more about how Systems Manager could be leveraged to manage instances in a hybrid environment, visit AWS Cloud Operation Services.

About the authors:

Terri Johnson

Terri is a Senior Solutions Architect supporting customers in the South Florida Territory who are early in their AWS Cloud journey. Terri is also an official AWS Mentor for junior Solution Architects and for members of the Amazon Military Apprentice Program. The Military Apprenticeship program at AWS helps members of the military community—veterans and their spouses—transition to careers in cloud computing. She has been with Amazon Web Services for 3 years, and began her career in the AWS Public Sector (2018), as a Partner SA supporting large Global AWS Partners.

Swara Meghana

Swara Meghana is a Cloud Infrastructure Architect with AWS Professional Services. She is an AWS certified Solution Architect Professional and provides customers with technical guidance on management, governance and security of Cloud Infrastructure. In her free time, she enjoys spending time with friends and family and watching movies.

David Wansell

David Wansell is an Enterprise Cloud Architect at Capgemini with over 20 years of experience across multiple enterprise domains. He designs and builds automation and solutions that enable customers to deliver on their desired outcomes in their cloud adoption journey.