AWS Storage Blog
Monitoring the health of Amazon FSx file systems using Amazon EventBridge and AWS Lambda
Storage administrators who are using managed services need a way to monitor the health of their resources, in order to detect any changes in resource health or accessibility that might require their attention or intervention. For administrators managing large fleets of resources, this monitoring needs to be efficient and scalable so that any alerts are routed to the right team and with the right priority to prevent any impact on the business.
Amazon FSx makes it easy and cost effective to launch, run, and scale feature-rich, high-performance file systems in the cloud. All of Amazon FSx’s file systems provide native monitoring and logging functionality that give administrators visibility into the health of their file systems. One important health indicator is a file system’s Lifecycle status, which is available through the Amazon FSx console or by querying the AWS CLI. However, some customers need to efficiently monitor the health of large numbers of file systems, and to set up notifications or rules based on changes in file system status.
In this blog, I will provide instructions and code for efficiently monitoring the health of any number of Amazon FSx file systems by configuring notifications on file system status changes. This solution will enable you to quickly detect and take action if your file systems aren’t healthy, allowing you to ensure business continuity.
Figure 1: Monitoring the health of Amazon FSx file systems using Amazon EventBridge and AWS Lambda
I cover the manual implementation so you can see how each component is configured, but also provide two automated solutions at the end of the post for easier deployment.
Workflow
You can choose between four widely-used Amazon FSx file system types: Windows File Server, Lustre, NetApp ONTAP, and OpenZFS. You can choose which file system type to use based on your familiarity with a given file system, or by matching the feature sets, performance profiles, and data management capabilities to the requirements of your workload. (Note this solution does not evaluate the health of FSx for NetApp ONTAP Storage Virtual Machines.)
For this solution, you will be using Amazon FSx for Windows File Server. You will run a Lambda function to query the FSx APIs on a schedule you define to evaluate the health of provisioned Amazon FSx file systems in a given region and send yourself an email notification. The solution uses the following services:
- Amazon FSx for Windows File Server
- Amazon SNS
- AWS Lambda
- Amazon EventBridge
- IAM
The workflow to implement the solution is:
- Create an Amazon SNS topic and an email subscription.
- Create an AWS Lambda Role with a custom policy to query FSx for its status.
- Create an AWS Lambda function to run your Python code.
- Confirm the Amazon SNS topic subscription.
- Create an Amazon EventBridge rule.
- Manually run your Lambda function to trigger delivery of your status notification.
1. Creating an Amazon SNS topic and email subscription
Amazon Simple Notification Service (SNS) is a fully managed messaging service. I will use SNS to capture the output of Lambda function and send a message to our operations team based on the status of your file systems. In this design, I utilize email as the notification mechanism however, since SNS is used you can integrate with virtually any platform.
- Log into the AWS Management console and browse to the SNS console.
- Expand the left-hand panel and select Topics.
- Select Create topic.
- Set the type to Standard.
- Set the name and description to fsx-health.
- Select Create topic.
Next, I create an SNS Subscription. An SNS subscription is versatile and allows your organization to surface your Amazon FSx file system health status to a variety of downstream services including:
- SMS
- Amazon Kinesis Data Firehose
- AWS Lambda
- AWS Chatbot
To configure a subscription for your Topic, follow these steps:
- Select your fsx topic in the console.
- Highlight the subscriptions tab.
- Select Create subscription.
- Select the protocol to be Email.
- Set the Endpoint to your destination distribution list or individual email address.
- Select Create Subscription.
- Before navigating away from the SNS console, obtain the SNS Topic ARN under details.
You are now ready to create the IAM Role for the Lambda function to use.
2. Creating a Lambda execution role in the IAM console
A Lambda function’s execution role is an AWS Identity and Access Management (IAM) role that grants the function permission to access AWS services and resources. You provide this role when you create a function, and Lambda assumes the role when the function is invoked. In this solution, you will need to access your custom SNS Topic, CloudWatch Logs, and Amazon FSx.
- Navigate to the AWS Management Console and browse to the IAM console.
- Select Roles and choose Create role.
- Set the Trusted entity type to AWS Service.
- Set the Use caseto Lambda and choose Next.
- Under Add permissions select Create policyselect JSON.
- Enter the following and edit the Region and Account ID from the previous section:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:<REGION>:<ACCOUNTID>:log-group:/aws/lambda/*",
"Effect": "Allow"
},
{
"Action": [
"fsx:DescribeFileSystems"
],
"Resource": "*",
"Effect": "Allow"
},
{
"Action": [
"sns:Publish"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
- Select Next: Tags, select Next: Review, enter a name of fsx-lambda-sns, and finally Create policy. Navigate back to the Add permissions console and select the Refresh icon next to Create Policy
a. Under Add permissions search for fsx-lambda-sns and select the policy we just created.
b. Select Next – Ensure the permission policies are attached.
c. Enter a role name of fsx-lambda and select Create role.
3. Create an AWS Lambda function
In this step, you will use AWS Lambda to execute the code without the need to provision and manage servers. This provides a simple and customizable platform to query and report on Amazon FSx file system health. The code is written in Python and will evaluate multiple file systems in one execution, send notifications if the file system status is not healthy, and send a separate notification per file system.
1. Navigate to the AWS Management Console and browse to the Lambda Console.
2. Select Create function.
a. Select Author from scratch.
b. For Function name enter fsx-health.
c. Set the Runtime to Python 3.8.
d. Set the Architecture to x86_64.
3. Expand Permissions, under Execution Role select Use an existing role, and select your fsx-lambda role.
4. Select Create function.
5. In the Code source window please delete the sample placeholder code and replace it with:
import json
import boto3
import os
def lambda_handler(event, context):
LAMBDASNSTOPIC = os.environ['LambdaSNSTopic']
fsx = boto3.client('fsx')
filesystems = fsx.describe_file_systems()
for filesystem in filesystems.get('FileSystems'):
status = filesystem.get('Lifecycle')
filesystem_id = filesystem.get('FileSystemId')
sns_client = boto3.client('sns')
if status != 'AVAILABLE':
print("The file system: {} needs attention.".format(filesystem_id))
sns_client.publish(TopicArn=LAMBDASNSTOPIC,Message="File System: " + filesystem_id + " needs attention. The status is: " + status, Subject = "FSx Health Warning!")
else:
print("The file system: {} is in a healthy state, and is reachable and available for use.".format(filesystem_id))
6. Navigate to Configuration, select Environment Variables, select Edit, choose Add environment variable, finally enter the key as LambdaSNSTopic and the value as <yourfsx-health SNS TOPIC ARN>.
a. Consider using AWS CloudShell to list your SNS Topic and their ARNs.
Deploy your Lambda function
- After making the code edits to the SNS topic, select Deploy.
- Select Test, enter Test as the Event name, and select Save.
- The test will open a new tab called Execution results and the code execution should succeed.
Figure 2: Lambda execution results
4. Confirm your Amazon SNS topic subscription
This step is necessary to allow SNS to subscribe the email address configured to continue to receive FSx file system health check notifications.
- Log into your email and check to see if you have a new email from SNS.
Figure 3: SNS subscription confirmation
- Select Confirm Subscription.
- You are now able to receive emails from your Lambda function.
5. Create an Amazon EventBridge rule
Use an Amazon EventBridge rule to schedule the execution of your Amazon FSx file system health check. As an alternative, and depending on your organizational preference, you can use Systems Manager Maintenance Windows.
1. Navigate to the AWS Management Console and browse to the Amazon EventBridge console.
2. Under Events select Rulesand choose Create Rule.
a. Enter fsx-health-trigger as the Rule Name.
b. Leave Event busas default and leave Enable the rule on the select event bus.
c. Change Rule type to Schedule and select Next.
d. Set the Schedule Pattern to A schedule that runs at a regular rate, such as every 10 minutes.
e. Set the rate to something reasonable. If you invoke the Lambda function too frequently you may begin to incur unwanted costs within your account.
f. Under Select target(s) choose AWS Service and select Lambda Function, choose your fsx-health function.
g. Leave the settings as default and select Next twice.
3. Choose Create Rule.
Figure 4: FSx Health EventBridge rule
Congratulations! Your function is now scheduled for regular execution.
6. Manually run our Lambda to trigger delivery of you status notification
In this step you will manually run a fsx-health Function to ensure your messages are being delivered.
- Navigate to the AWS Management Console and browse to the Lambda Console.
- On the left-hand pane, select Functions and choose fsx-healthFunction from earlier.
- Select Test to manually trigger your Lambda function to evaluate your Amazon FSx instances health.
- Check your email to confirm you received the notification.
Figure 5: SNS email notification
Pricing
This solution is lightweight, and for daily checks, the usage will fit into the free Lambda tier. In these examples, I am running this hourly; however, for email integration this would be noisy without a service to interpret and present the data to recipients. There are many factors that could change the cost for the solution. If you are using a separate monitoring platform and increase the execution frequency, amount of memory allocated, and quantity of file systems evaluated (increased execution duration), it could change your cost. AWS recommends the use of the AWS pricing Calculator to better understand the impact of any changes on your infrastructure.
Figure 6: Amazon FSx Health Pricing
Lambda execution scheduled for every 5 minutes still fits within free tier usage
CloudWatch Logs at 1GB a month of storage: $.50 per month
SNS hourly emails: $.01
Yearly total: $6.17
Automated deployment
You can download a CloudFormation template to deploy the solution here. You will deploy the template as a standard CloudFormation stack. During deployment you will need to specify the email address where you would like to send your notifications and the frequency of execution for the health check. By default, the template will set the schedule to run the check every 5 minutes. You will need to ensure you confirm the SNS subscription to begin to receive notifications.
A Terraform project is also available for this solution. Terraform is an open-source infrastructure as code (IaC) tool that allows you to build, change, and version infrastructure safely and efficiently. This includes both low-level components like compute instances, storage, and networking, as well as high-level components like DNS entries and SaaS features. In order to use the terraform solution you will need to do the following:
- Setup a machine configured to run the latest version of Terraform.
- Download the Terraform infrastructure composition for the solution here.
- Change the variables.tf file to include your account ID, desired region, base naming convention, and email.
- Run terraform init.
- Run terraform plan and review any errors.
- Run terraform apply.
- Subscribe to the SNS topic to begin receiving emails.
Conclusion
In this blog post, you learned how to more efficiently monitor the status of your Amazon FSx file systems, and to configure notifications when a file system status changes. By implementing this approach across your organization, you will have a simple and repeatable solution to monitor the health of your Amazon FSx file systems with minimal annual cost.
You can modify this solution to send notifications to many endpoints – allowing flexibility for organizations to adopt this as part of their infrastructure monitoring strategy. I recommend reviewing how this solution can integrate with AWS Chatbot to send notifications to Slack. If you are looking to implement this in a centralized fashion you can leverage AWS Systems Manager Automation to run this code against multiple account and regions from a single account.