AWS Storage Blog

Automate monitoring at scale for Amazon FSx for NetApp ONTAP volumes

User files are increasingly growing in number and size. Maintaining and managing file growth can be challenging without an effective set of tools and automation that scales with your data growth. Customers agree that visibility is key for managing existing files and for developing a plan to support future growth.

Amazon CloudWatch is a service that monitors applications, responds to performance changes, optimizes resource use, and provides insights into operational health. By collecting data across AWS resources, CloudWatch gives visibility into system-wide performance and allows users to set alarms, automatically react to changes, and gain a unified view of operational health. You can monitor your Amazon FSx for NetApp ONTAP volumes across several metrics using Amazon CloudWatch. However, users can have tens or hundreds of volumes across multiple filesystems. Furthermore, it can be challenging to build monitoring at scale for these volumes both programmatically and through the AWS Management Console.

In this post, we show you an automated solution that uses Amazon CloudWatchAWS Lambda, and Amazon Simple Notification Service (Amazon SNS) to monitor your FSx for ONTAP volumes and file systems at scale.

Solution overview

In this solution, you use an AWS CloudFormation template that provisions and invokes a Lambda function with the necessary permissions and an SNS topic. The Lambda function creates a CloudWatch alarm for every FSx for ONTAP volume and file system in the AWS Region. Specifically, the alarm monitors the file system metrics or volume metrics of your choosing. Figure 1: illustrates the solution architecture.

Architecture diagram for CloudWatch monitoring Amazon FSx for NetApp ONTAP file systems and volumes

Figure 1: Architecture diagram for CloudWatch monitoring Amazon FSx for NetApp ONTAP file systems and volumes

In the following sections, we discuss how to set up the resources necessary for this FSx for ONTAP volume monitoring solution.

Prerequisites

To complete this walkthrough you should have the following:

  • An AWS account
  • An existing FSx for ONTAP file system with volumes.

Create solution resources with CloudFormation

You use a CloudFormation template to set up the solution resources. The CloudFormation stack creates a Lambda function, Lambda AWS Identity and Access Management (IAM) execution role, SNS topic, and Amazon SNS subscription. Follow these instructions to get started:

1. Download the CloudFormation template (YAML file).

2. On the CloudFormation console, select the Region where you would like to deploy this solution.

3. Create a stack with new resources, as shown in Figure 2:

Figure2: Create a Stack

Figure 2: Create a stack with new resources

4. Upload the YAML file, as shown in Figure 3:

Figure 3: Choose file

Figure 3: Upload the YAML file

5. Enter a name for the stack name.

6. Type in the file system metrics and volume metrics that you would like to monitor in the fields SelectedFileSystemMetrics and SelectedVolumeMetrics.

7. Type in the CloudWatch alarm operator you want to use for each corresponding metric in the fields SelectedFileSystemMetricsOperator and SelectedVolumeMetricsOperator.

8. Type in the CloudWatch alarm period you want to use for each corresponding metric. A period is the length of time associated with a specific CloudWatch statistic.

9. Type in the CloudWatch alarm statistic you want to use for each corresponding metric. Each statistic represents an aggregation of the metrics data collected for a specified period of time.

10. Type in the CloudWatch alarm threshold you want to use for each corresponding metric.

For example, by default, this stack creates a CloudWatch alarm for StorageUsed and CPUUtilization for every file system, the operator as “Greater Than” for both metrics, the period as 240 seconds for StorageUsed and CPUUtilization, the statistic as “Average” for StorageUsed and CPUUtilization, and the threshold as 100,000,000 for StorageUsed and 80 for CPUUtilization. This is shown in Figure 4:

Figure 4:

Figure 4:

Figure 4: Stack for creating the CloudWatch alarm

11. Optionally, choose an IAM role for the stack. If no IAM role is selected, then CloudFormation uses the credentials from the current user. Select Next.

12. Select the checkbox next to I acknowledge that AWS CloudFormation might create IAM resources with custom names. Then, select Submit, as shown in Figure 5:

Figure 5: Select checkbox

Figure 5: Select checkbox

After the CloudFormation stack has been created, you now have a CloudWatch alarm that monitors the CloudWatch metrics you selected for every FSx for ONTAP file system and volume in the Region. This can be verified by viewing the CloudWatch alarms console. Figure 6: is an example image of the CloudWatch console after the stack was created, showing an alarm created for StorageUsed and CPUUtilization for each file system, and DataReadOperationTime and DataWriteOperationTime alarms for each volume. The warning you get is because of no endpoints being subscribed to the SNS topic, which we address in the next section.

Figure 6:

Figure 6: Example image of CloudWatch console after the stack was created

Create an Amazon SNS subscription

You need to create an Amazon SNS subscription for the SNS topic you created to be notified when the CloudWatch alarms you built are in an ALARM state. To do this, go through the following steps, also shown in Figure 7:

1. Go to the Amazon SNS console in the correct Region.

2. Select the FSx-Monitoring-Solution

3. Under Subscriptions, select Create subscription

Figure 7: Create SNS subscription

Figure 7: Create Amazon SNS subscription

4. Select the protocol of your choice. Amazon SNS supports a wide variety of destinations, both application-to-application (A2A), such as Lambda and Amazon Simple Queue Service (Amazon SQS), and as application-to-person (A2P), such as SMS, email, and PagerDuty, as shown in Figure 8:

Figure 8: Create subscription

Figure 8: Create subscription

5. After you’ve filled out the Protocol and endpoint details, select Create subscription

6. Now you are notified through the Protocol/endpoint selected for the CloudWatch FSx for NetApp ONTAP alarms created by this solution in the preceding step. Figure 9: is an example email from an alarm generated when the CPU Utilization is greater than 80%, showing the information necessary to take action, such as account ID, volume ID, file system ID, AWS Region, and timestamp.

Figure 9:

Figure 9: Example email from an alarm generated when the CPU utilization is greater than 80%

Cleaning up

If you would like to decommission this solution, you can delete the CloudFormation stack, which deletes the Lambda function, SNS topic, Lambda execution role, and policy. Furthermore, you must also delete the CloudWatch alarms created.

Conclusion

In this post, we showed you a solution that uses an AWS Lambda function to create Amazon CloudWatch alarms to monitor the file system metrics and volume metrics of your choosing for your Amazon FSx for NetApp ONTAP file systems. The solution notifies you through the Amazon SNS endpoint you selected when your alarm thresholds are breached, and when they return to OK status.

Thank you for reading this post. Download the AWS CloudFormation template to get started with this solution, and feel free to add comments to this post with questions or feedback.

Yassin Abouel Seoud

Yassin Abouel Seoud

Yassin Abouel Seoud is an Enterprise Support Lead at AWS. He works with users on improving their workloads’ performance and resiliency, and resolving blockers. Prior to AWS, Yassin completed his Bachelors of Engineering in Mechanical Engineering at McGill University, and worked on expanding Alexa’s understanding and answering internationally.

Kartik Bheemisetty

Kartik Bheemisetty

Kartik Bheemisetty is a Sr Technical Account Manager under the US- ISV segment, where he helps users achieve their business goals with AWS cloud services. He holds subject matter expertise in AWS Network and Content Delivery services. He offers expert guidance on best practices, facilitates access to subject matter experts, and delivers actionable insights on optimizing AWS spend, workloads, and events. You can connect with him on LinkedIn.