AWS Storage Blog

Cross-region disaster recovery with Amazon FSx for NetApp ONTAP

Data protection is a top priority for our customers. Your disaster recovery (DR) strategy might require you to continuously replicate your data across multiple geographic regions to mitigate against natural disasters such as earthquakes or technical disasters that may affect a particular geographic region.

Amazon FSx for NetApp ONTAP, which provides fully managed shared storage built on NetApp’s popular ONTAP file system, offers an efficient and turn-key solution to replicate data across AWS Regions without requiring any external replication servers. In addition, FSx for ONTAP enables asynchronous replication of data from one AWS Region to another which is applicable to both pilot-light and warm-standby DR strategies. These characteristics can aid in optimizing your DR strategy,

In this post, we discuss the tradeoffs of choosing single-Availability Zone (AZ) and multi-AZ FSx for ONTAP file systems as the destination file system for replication. In addition, we cover the prerequisites and performance implications for NetApp SnapMirror, which enables data replication between the file systems. We deal with the replication of data across AWS Regions for cross-Region DR scenarios using the NetApp SnapMirror feature in FSx for ONTAP. With the strategies discussed in this blog post, we aim to empower you with the knowledge to increase the availability and speed of recovery of your business critical data – all while reducing bandwidth utilization and storage needs via compression on replicated data and data deduplication.

Solution overview for cross-Region data replication

FSx for ONTAP provides feature-rich, fast, and flexible shared file storage that’s broadly accessible from Linux, Windows, and macOS compute instances running in AWS or on-premises. FSx for ONTAP makes it easy for you to meet your DR needs by enabling asynchronous replication of your data across Availability Zones and AWS Regions. See the documentation for more information.

The destination is updated to reflect incremental changes in the source according to a schedule that you specify and remains available whenever you need it. Furthermore, the copy of the volume in the standby AWS Region can be mounted in a read-only manner. Replication can be scheduled as frequently as every five minutes, although intervals should be carefully chosen based on RPOs Recovery Point Objectives (RPOs), Recovery Time Objectives(RTOs), and performance considerations.

Cross-Region DR to a single-AZ FSx for ONTAP file system

The single-AZ deployment type is designed to provide high availability and durability within an Availability Zone. The AWS infrastructure powering each single-AZ file system resides in separate fault domains within a single Availability Zone. As is the case with the multi-AZ option, the infrastructure is monitored and replaced automatically, and failover typically completes within seconds.

In the multi-Region setup, you can create a standby FSx for ONTAP file system in the standby AWS Region with a single-AZ configuration. This deployment type offers the same ease of use and data management capabilities as the multi-AZ option, with 50% lower storage costs and 40% lower throughput costs.

A single-AZ file system is deployed in the standby AWS Region. The SnapMirror relationship is established between the source volume of the file system in the primary AWS Region with the destination volume of the single-AZ file system in the standby configuration.

Figure 1: Cross-Region SnapMirror DR with single-AZ file system in the standby configuration

Figure 1: Cross-Region SnapMirror DR with single-AZ file system in the standby configuration

Cross-Region DR to a multi-AZ FSx for ONTAP file system configuration

In a multi-AZ FSx for ONTAP setup, the data is replicated across multiple Availability Zones. The file system is designed to be resilient to the loss of a single-AZ. Depending on RPO/RTO requirements, customers can choose a cross-Region DR setup. The applications are typically deployed in an active-active/warm-standby manner in a multi-Region setup.

Figure 2: Cross-Region DR setup with FSx for ONTAP file system in a multi-AZ setup

Figure 2: Cross-Region DR setup with FSx for ONTAP file system in a multi-AZ setup

A multi-AZ file system is deployed in the standby AWS Region. The SnapMirror relationship is established between the source volume of the file system in the primary AWS Region with the destination volume of the multi-AZ file system in the standby AWS Region.

An item of consideration regarding multi-AZ as opposed to single-AZ deployments is the availability of FSx for ONTAP for your business. In a multi-AZ deployment, a failure of FSx for ONTAP in an Availability Zone does not result in any downtime. In a single-AZ deployment, FSx for ONTAP would become unavailable for usage.

In a secondary AWS Region, in the case of a regional outage in the primary AWS Region failover to the secondary AWS Region, FSx for ONTAP would be manual and would have to be completed by operational teams to process data in the secondary AWS Region. In terms of your deployment, consider the business outcomes for RPO and RTO and the costs to achieve these outcomes when deploying FSx for Netapp ONTAP on a multi-AZ, multi-Region architecture.

FSx for ONTAP SnapMirror overview and considerations

The NetApp SnapMirror replication solution is built into NetApp ONTAP for Business Continuity and Disaster Recovery (BCDR) purposes, and it is built on ONTAP snapshots technology. You can use SnapMirror to replicate data from a source FSx for ONTAP file system to the destination FSx for ONTAP file system. When using SnapMirror, the following happens:

  • A snapshot of the data on the source is created.
  • The snapshot is copied to the destination. This process creates a destination that is online, read-only, and contains the same data as the source at the time of the most recent update.
  • The destination is updated to reflect incremental changes on the source according to a schedule you specify.

When a SnapMirror relationship is established, the destination volume is an identical replica of the source, including snapshots, volume settings, and ONTAP storage efficiency features. You can break the SnapMirror relationship during a failover event. This lets you conduct accurate DR testing. Breaking the SnapMirror relationship makes the destination volume writable and does not impact the primary ONTAP volume.

A comprehensive guide to configure the SnapMirror relationship between the source and destination file systems can be found at Migrating to FSx for ONTAP using NetApp SnapMirror. At the DR site, the destination volume can be mounted in a read-write manner using the instructions at cutting over to Amazon FSx.

SnapMirror replication traffic

In a multi-AZ, deployment, SnapMirror replicates the data between the file systems using their inter-cluster logical interfaces (LIFs). Since the source is a multi-AZ FSx for ONTAP deployment, each file system is created with a single inter-cluster LIF on each node of the HA pair. Each of the LIFs in this setup resides in a different Availability Zone. In the event of an outage in one Availability Zone during replication to another FSx for ONTAP file system, the replication traffic continues to flow after fail-over to the NetApp’s node in the standby Availability Zone. This allows file serving to continue in the primary AWS Region and has no impact on SnapMirror replication to the secondary AWS Region for DR purposes.

SnapMirror networking prerequisites

The following prerequisites are required for SnapMirror networking:

  • Network connectivity between primary and standby FSx for ONTAP systems must exist. This can be accomplished either by VPC peering or connecting the VPCs to an AWS Transit Gateway.
  • SnapMirror uses ports 10000, 11104, and 11105. The security groups associated with the FSx for ONTAP ENIs should allow traffic from peer cluster on these ports.
  • Every intercluster LIF on the local cluster must be able to communicate with every intercluster LIF on the remote cluster.
  • The name and IP address of the source system must be in the vserver services dns hosts file of the destination system and vice versa. Or they must be resolvable through the DNS, if you want to use DNS names to establish cluster peer relationships
  • SnapMirror replication compatibility between the FSx for ONTAP file systems is described in the Unified Replication relationships section of the Compatible ONTAP versions for SnapMirror relationships article in the NetApp ONTAP documentation center.

Conclusion

In this post, we covered how Amazon FSx for NetApp ONTAP provides a turn-key DR solution to replicate the data asynchronously across different AWS Regions. We discussed using both single-AZ and multi-AZ destination file system configurations as the destination file system. Based on the designs cited here, a single-AZ design is more cost effective design but does not have the same level of fault tolerance as a multi-AZ design. When architecting any DR solution, your business needs and requirements should be factored in when making choices that have implications on the cost of the DR solution. SnapMirror enables you to configure replication with an RPO of as low as five minutes, and an RTO in single digit minutes.

With Amazon FSx for NetApp ONTAP, you receive all of the benefits of a fully managed service, simplifying your data management and reducing the costs of running on-premises infrastructure. Augmenting your data protection strategy with NetApp’s SnapMirror further offers enterprise-level data protection to make sure your data is protected and available. Use these services and tools today and streamline your operations with Amazon FSx for NetApp ONTAP!

Joe Dunn

Joe Dunn

Joe is an AWS Principal Solutions Architect in Financial Services with over 20 years of experience in infrastructure architecture and migration of business-critical loads to AWS. He helps financial services customers to innovate on the AWS Cloud by providing solutions using AWS products and services.

Amit Borulkar

Amit Borulkar

Amit is a Principal Solutions Architect with Amazon Web Services (AWS) focused on helping customers craft highly resilient and scalable cloud architectures which address their business problems. He also holds a Masters degree in Computer Science from North Carolina State University.