Microsoft Workloads on AWS

SQL Server High Availability Deployments Using Amazon FSx for NetApp ONTAP

UPDATE: On 26th May 2022, AWS Launch Wizard added support to SQL Server deployments using Amazon FSx for NetApp ONTAP. AWS Launch Wizard uses a simple console wizard to simplify the deployment of SQL Server Failover Cluster Instances (FCI) with Amazon FSx for NetApp ONTAP. To learn more about using AWS Launch Wizard, visit the technical documentation.


When deploying a mission-critical workload, it is of paramount importance to ensure that all components are highly available. Customers choosing to use SQL Server within their mission-critical workloads can achieve high availability by deploying SQL Server Always On. SQL Server Always On can be deployed in two main configurations: Always On Failover Cluster Instances (FCI) and Always On Availability Groups (AGs).

In this blog, we will focus on how to deploy SQL Server Always On Failover Cluster Instances (Always On FCIs). Up until now, the simplest and most cost-effective way to deploy Always On FCIs on AWS has been by using Amazon FSx for Windows File Server for the shared storage layer. But at AWS, we are always looking for more ways to meet the needs of our customers. To this end, we recently launched Amazon FSx for NetApp ONTAP (FSx for ONTAP) as a fully managed, highly reliable, scalable, performant, and feature-rich file storage solution built on NetApp’s popular ONTAP file system. FSx for ONTAP provides multi-protocol access to data over the NFS, SMB, and iSCSI protocols for Windows and Linux systems.

An Always On FCI enables you to run a SQL Server instance across multiple Windows Server Failover Cluster (WSFC) nodes. The SQL instance is active on a single node within the cluster, and in the event of a failure, the ownership of the SQL instance and all associated resources is transferred to another node within the WSFC. Shared storage is required to enable this configuration. One of the key benefits of this configuration is that it negates the need to upgrade from SQL Standard to SQL Enterprise solely to support Always On AGs. Also, as SQL Server FCI provides protection on the instance level, all administrative (like creating and managing logins and users) and scheduling (SQL Agent jobs and schedules) changes are automatically propagated from active to passive FCI instances within the WSFC through the shared storage.

SQL Server Always On FCIs support the use of shared storage presented via both SMB and iSCSI. For SMB, AWS customers have been benefitting since 2018 from the use of FSx for Windows File Server. With the release of FSx for ONTAP, customers can also choose to use SMB on FSx for ONTAP or iSCSI on FSx for ONTAP. In this blog post, we will walk you through how to set up a SQL Server Always On FCI with FSx for ONTAP as shared iSCSI storage.

Solution Overview 

FSx for ONTAP file systems are highly available and durable across AWS Availability Zones (AZs) and are designed to provide continuous availability to data. Each file system is powered by two file servers in separate AZs, each with its own storage. Amazon FSx automatically replicates your data across AZs to protect it from component failure, continuously monitors for hardware failures, and automatically replaces infrastructure components in the event of a failure.

FSx for ONTAP file systems are set up as Multi-AZ by default. Each file system has a preferred (active) subnet and a standby (passive) subnet. In our scenario, Availability Zone A hosts the preferred subnet and Availability Zone B hosts the standby subnet. The FSx for ONTAP file system provides management and protocol specific endpoints for each AZ. Your SQL Server FCI nodes access your FSx for ONTAP file system through elastic network interfaces (ENI). These network interfaces reside in the Amazon VPC that you associate with your file system. Clients access the FSx for ONTAP file system via these ENIs (Preferred & Standby).

As a good practice, the active SQL Server FCI node should be in the same subnet as the FSx for ONTAP file system preferred subnet in order to achieve the best throughput and lowest latency, and to avoid unnecessary inter-AZ network traffic.

In the following walkthrough, we set up a two-node SQL Server FCI with FSx for ONTAP providing shared storage leveraging the iSCSI protocol.

We recommend configuring both the preferred and the standby iSCSI endpoints on the client side (SQL Server FCI nodes) as iSCSI targets. The MPIO driver recognizes multiple paths to the storage. Failover and failback operations are handled by the MPIO failover policy and ALUA. The MPIO failover policy should be set to “round robin” so that only active/optimized paths from the iSCSI client (SQL Server FCI nodes) to the iSCSI target (FSx for ONTAP file system) are used at any given time.

Figure 1 - SQL Server FCI setup with iSCSI shared disks (FSx for ONTAP)

Figure 1 – SQL Server FCI setup with iSCSI shared disks (FSx for ONTAP)

Important Considerations

When configuring SQL workloads, it is important to consider how you intend to back up your databases. The FSx for ONTAP service provides automated backup functionality including built-in backups and snapshots. For simplicity, in this example scenario, we disable FSx for ONTAP automated backups and snapshots in favor of SQL native tools. If you decide to leverage FSx for ONTAP’s advanced data protection functionality, then you must consider the additional storage requirements for backup snapshots.

Access to the FSx for ONTAP file system is over the network, so you should choose an Amazon Elastic Compute Cloud (Amazon EC2) instance with sufficient networking performance to support your workload. In this walkthrough, we will use r5n.4xlarge instances as SQL Server FCI nodes. This EC2 instance provides up to 25Gbps network bandwidth. Having an appropriate instance type and size to sustain both the network and the iSCSI traffic is crucial for optimal performance. As always, please test your workloads before porting them into production. If you have especially high disk throughput requirements (e.g. greater than 1024MB/s), consider enabling Jumbo frames on your instances and configuring additional iSCSI sessions to each iSCSI target.

Please review the NetApp documentation to configure your storage based on your workload and performance requirements: Best Practices Guide for Microsoft SQL Server with ONTAP. We have detailed FSx for ONTAP settings in step 7 of the walkthrough below, together with the commands required to set them.

Prerequisites and Basic Setup

Before starting the walkthrough, make sure you have completed or prepared the following steps/requirements:

  • Microsoft Active Directory and a privileged AD user account to perform configuration of Windows instances. In this walkthrough, we use AWS Managed Microsoft AD.
  • Active Directory (AD) domain controllers must have all necessary ports opened for the SQL Server nodes to join the domain and authenticate against AD.
  • Microsoft SQL Server installation media. For this walkthrough, we use SQL Server 2019 Developer edition.
  • Launch 2 Amazon EC2 Windows Server 2019 instances in 2 private subnets across two Availability Zones inside a Virtual Private Cloud (VPC).
  • You can use RDP to connect from a jump/bastion server to the cluster node instances or use AWS Systems Manager Fleet Manager.
  • Configure 2 additional secondary IPs on the same ENI for each cluster node (3 IPs in total per EC2 instance).
  • Join the nodes to the domain. Verify that you are logged in using domain credentials before creating and configuring the cluster.
  • Configure security group inbound rules and Windows Firewall exceptions to allow the nodes to communicate in a restrictive environment.
  • Determine the storage space requirements (e.g. data, logs, backup and NetApp ONTAP Snapshot storage space).
  • Set up an FSx for ONTAP file system. For this walkthrough, we configured FSx for ONTAP with 2TB of SSD storage capacity, 512MB/s throughput capacity and 20,000 Provisioned IOPS. Please review the Amazon FSx for NetApp ONTAP performance page.
  • For this example workload, we have configured backups using native SQL server technology to separate storage. For this reason, we have disabled the FSx for ONTAP automatic backups as we discussed in the Important Considerations section.

Table 1 - IP addresses used in this walkthrough

Table 1 – IP addresses used in this walkthrough

Walkthrough

Evaluate the commands in this walkthrough in a test environment before attempting to use them in production.

Setup Amazon FSx for ONTAP file system and configure iSCSI target settings

First, follow the steps in FSx for ONTAP documentation to create a file system. In this walkthrough, we use a number of scripts and commands referring to the FSx for ONTAP storage virtual machine (svm) as sql-svm01. If you want to save yourself some time, use the same name for your svm. While creating the FSx for ONTAP file system using the standard create option, you may leave default volume creation as blank and you do not need to join the svm to AD. We will create the required ONTAP volumes later in this walkthrough. We use the default VPC and VPC’s default route table.

Then we install and configure the Windows features that we need in order to complete this walkthrough.

1. Connect to your EC2 instance using RDP with an AD user account that has local Administrator permissions on both nodes.

2. On each instance, install the required features (Multipath-IO, Failover-Clustering, RSAT-AD-Tools, RSAT-DNS-Server) using Server Manager, or run the following PowerShell (run as Administrator) command:

Install-WindowsFeature RSAT-AD-Tools, RSAT-DNS-Server, Multipath-IO, Failover-Clustering -IncludeManagementTools -Restart

3. On each instance, start the Microsoft iSCSI Initiator Service and set the startup type to ‘Automatic’ from the Services snap-in, or run the following PowerShell (run as Administrator) command:

#Start iSCSI initiator
Start-Service MSiSCSI
Set-Service -Name msiscsi -StartupType Automatic

4. On each instance, go to iSCSI initiator properties and choose the configuration tab and write down the initiator name, or run the following PowerShell (run as Administrator) command:

#Get iSCSI initiator name of cluster nodes
(Get-InitiatorPort).NodeAddress

5. Write down the management endpoint IP address from the FSx for ONTAP console.

Figure 2 - FSx for ONTAP Console – File system Endpoint IP addresses

Figure 2 – FSx for ONTAP Console – File system Endpoint IP addresses

6. Connect to your FSx for ONTAP system using SSH (e.g. ssh fsxadmin@x.x.x.x from PowerShell). If you cannot access your file system, follow the instructions here.

7. From your FSx for ONTAP ssh session, use the following commands to create a volume, an iGroup, 3 LUNs (Quorum, Data and Logs) and map these LUNs to the iGroup. Customise these commands based on your own requirements and naming conventions. Ensure you update the iSCSI initiator names below with the ones that you recorded in step 4. The command will succeed even if you do not update the initiator names, but your SQL Server FCI nodes will be unable to connect to the iSCSI target.

volume create -vserver sql-svm01 -volume SQLCluster01 -aggregate aggr1 -size 500G -state online -tiering-policy snapshot-only -percent-snapshot-space 0 -autosize-mode grow -snapshot-policy none
volume modify -vserver sql-svm01 -volume SQLCluster01 -fractional-reserve 0
volume modify -vserver sql-svm01 -volume SQLCluster01 -space-mgmt-try-first snap_delete
volume snapshot autodelete modify -vserver sql-svm01 -volume SQLCluster01 -delete-order oldest_first -enabled true
igroup create -igroup SQLCluster01-IG -ostype windows -protocol iscsi -vserver sql-svm01 -initiator iqn.1991-05.com.microsoft:sqlnode1.corp.aws.example.com,iqn.1991-05.com.microsoft:sqlnode2.corp.aws.example.com
lun create -vserver sql-svm01 -volume SQLCluster01 -lun quorum -size 1G -ostype windows_2008
lun create -vserver sql-svm01 -volume SQLCluster01 -lun sqldata01 -size 200G -ostype windows_2008
lun create -vserver sql-svm01 -volume SQLCluster01 -lun sqllogs01 -size 100G -ostype windows_2008
lun map -vserver sql-svm01 -volume SQLCluster01 -lun quorum -igroup SQLCluster01-IG
lun map -vserver sql-svm01 -volume SQLCluster01 -lun sqldata01 -igroup SQLCluster01-IG
lun map -vserver sql-svm01 -volume SQLCluster01 -lun sqllogs01 -igroup SQLCluster01-IG

These commands configure the following settings:

Table 2 - ONTAP configuration for SQL Server FCI setup in this walkthrough

Table 2 – ONTAP configuration for SQL Server FCI setup in this walkthrough

Configure iSCSI Storage and WSFC on Amazon EC2 Windows

Now we will initiate and configure the EC2 Windows instances with iSCSI connections and disks.

8. On each EC2 instance, retrieve all the IP addresses (3 IPs per EC2 instance) and write them down. These IP addresses are used in subsequent steps.

9. Get the iSCSI endpoint IP addresses (preferred subnet and standby subnet) from the FSx for ONTAP console. Choose your svm (in this walkthrough sql-svm01) on the Storage Virtual Machines page. These IP addresses are used as iSCSI targets in the next step.

Figure 3 - FSx for ONTAP Console - ISCSI Endpoint IP addresses

Figure 3 – FSx for ONTAP Console – ISCSI Endpoint IP addresses

10. On each instance, go to iSCSI initiator settings and connect to your iSCSI targets (FSx for ONTAP iSCSI endpoints), or run the following PowerShell (run as Administrator) script – (Replace the IP addresses with the ones you retrieved in the previous step):

#Retrieve iSCSI target endpoint IP Addresses from AWS FSx for ONTAP page.
#IP addresses for Preferred and Standby subnets
$TargetPortalAddresses = @("172.31.10.123","172.31.20.123")
#Primary IP address of EC2 instance recorded in step 1. 
#In our demo this is 172.31.10.100 (SQLnode1) and 172.31.20.100 (SQLnode2). Refer to the diagram above.
$LocaliSCSIAddress = "172.31.10.100"
#Connect to FSx for Netapp ONTAP file system
Foreach ($TargetPortalAddress in $TargetPortalAddresses){
New-IscsiTargetPortal -TargetPortalAddress $TargetPortalAddress -TargetPortalPortNumber 3260 -InitiatorPortalAddress $LocaliSCSIAddress
}
#Add MPIO support for iSCSI
New-MSDSMSupportedHW -VendorId MSFT2005 -ProductId iSCSIBusType_0x9
#Establish iSCSI connection
1..3 | %{Foreach($TargetPortalAddress in $TargetPortalAddresses){Get-IscsiTarget | Connect-IscsiTarget -IsMultipathEnabled $true -TargetPortalAddress $TargetPortalAddress -InitiatorPortalAddress $LocaliSCSIAddress -IsPersistent $true} }
#Set the MPIO Policy to Round Robin
Set-MSDSMGlobalDefaultLoadBalancePolicy -Policy RR

Figure 4 – Disk Device Properties, MPIO tab for LUN connected via iSCSI

Figure 4 – Disk Device Properties, MPIO tab for LUN connected via iSCSI

11. On the EC2 instance in Availability Zone A, go to the Windows disk management tool and initialize and format each new iSCSI disk, or run the following PowerShell script (run as Administrator):

#Retrieve a list of FSx for ONTAP disks 
$disklist=Get-Disk | Where-Object{$_.FriendlyName -eq 'NETAPP LUN C-MODE'} | Sort-Object -Property Size 
#Create a list of drive letters
$driveletters=@("Q","L","S")
#Initiate, create and format volumes from the list of available FSx for ONTAP disks

foreach($dk in $disklist)
{   
    if(($dk).IsOffline -eq $True){
       Set-Disk -Number ($dk).Number -IsOffline $False

    }
    if(($dk).PartitionStyle -eq 'RAW'){
        Initialize-Disk -Number ($dk).Number -PartitionStyle GPT -ErrorAction SilentlyContinue
    }
    if($dk.IsReadOnly -eq $True){ 
        Set-Disk -Number ($dk).Number -IsReadOnly $False
    }
} 

New-Partition -DiskNumber ($disklist[0]).Number -UseMaximumSize -DriveLetter $driveletters[0] | Format-Volume -FileSystem NTFS -Force -NewFileSystemLabel Quorum
New-Partition -DiskNumber ($disklist[1]).Number -UseMaximumSize -DriveLetter $driveletters[1] | Format-Volume -FileSystem NTFS -AllocationUnitSize 65536 -Force -NewFileSystemLabel SQL-Log
New-Partition -DiskNumber ($disklist[2]).Number -UseMaximumSize -DriveLetter $driveletters[2] | Format-Volume -FileSystem NTFS -AllocationUnitSize 65536 -Force -NewFileSystemLabel SQL-Data 

12. On the EC2 instance in Availability Zone A, set up the WSFC using Failover Cluster Manager or run the following PowerShell script (run as Administrator):

Ensure you change the $WSFCClusterName and $ClusterNodes variables to match your environment, as well as altering the $ClusterIPs variable to include the first of the secondary IP addresses of each node (recorded in step 8). In our walkthrough, SQLNode1 has a primary IP of 172.31.10.100 and secondary IP addresses of 172.31.10.101 and 172.31.10.102, so we will use 172.31.10.101. We will use the third IP address during the SQL Server FCI installation.

# Populate the cluster node names and set the cluster name for the WSFC
$WSFCClusterName = "WSFC1"
$ClusterNodes = ("SQLNode1", "SQLNode2")
$ClusterIPs = ("172.31.10.101", "172.31.20.101")

# Create a Cluster
New-Cluster -Name $WSFCClusterName -Node $ClusterNodes -AdministrativeAccessPoint ActiveDirectoryAndDNS -StaticAddress $ClusterIPs

# Rename Cluster Resources
$ClusterIPResources = Get-ClusterResource | Where-Object {$_.ResourceType -eq "IP Address"} | Select-Object -ExpandProperty Name
$ClusterIPResources.ForEach( { (Get-ClusterResource -Name "$_").Name = Get-ClusterResource -Name "$_" | Get-ClusterParameter -Name Address | Select-Object -ExpandProperty Value } )
(Get-ClusterGroup "Cluster Group").Name = $WSFCClusterName
(Get-ClusterResource | Where-Object {$_.ResourceType -eq "Network Name" -and $_.OwnerGroup -eq $WSFCClusterName}).Name = $WSFCClusterName

# Set possible owners for IP addresses - each node can only own its own IP
for($i = 0; $i -le 1; $i++){ Get-Cluster -Name $WSFCClusterName | Get-ClusterResource -Name $ClusterIPs[$i] | Set-ClusterOwnerNode -Owners $ClusterNodes[$i] }

# Set dependencies for the cluster name object to be either/or for the node IP addresses
Set-ClusterResourceDependency -Resource $WSFCClusterName -Dependency ("[$($ClusterIPs[0])] or [$($ClusterIPs[1])]")

# Test the cluster
Test-Cluster

Figure 5 – Disks shown in Failover Cluster Manager

Figure 5 – Disks shown in Failover Cluster Manager

SQL Server Failover Cluster Installation

13. Run the SQL Server Setup from the installation media, and then select installation and New SQL Server Failover Cluster Installation. Complete the cluster installation wizard with your choice of configuration parameters on both SQL Server FCI nodes. Here we highlight some important steps of the installation process.

14. In Cluster Network Configuration, select the IP address line that corresponds to the current SQL node’s subnet and clear the DHCP checkbox and use the third IP address of the SQL Server nodes (Secondary IP 2 for each node from Table 1). In our walkthrough, SQLNode1 is configured with 172.31.10.102 as the third IP address.

Figure 6 - Cluster Installation Wizard: SQLNode1 Cluster Network Configuration

Figure 6 – Cluster Installation Wizard: SQLNode1 Cluster Network Configuration

15. During installation, you can choose the “Grant Perform Volume Maintenance Task” option to enable the instant file initialization feature for SQL Server.

Figure 7 - Cluster Installation Wizard:  Server Configuration

Figure 7 – Cluster Installation Wizard: Server Configuration

After you finish installation of the first node of your SQL FCI cluster, you can go ahead and add additional nodes to your SQL Server FCI.

Clean Up

It is a best practice to delete resources that you are no longer using so that you do not incur unintended charges. You can clean up the following resources you created for this tutorial:

Conclusion

Customers have been deploying Microsoft SQL Always On Failover Cluster Instances on NetApp storage on premises very successfully for many years. With Amazon FSx for NetApp ONTAP, customers are now able to deploy and configure SQL FCIs on fully managed storage built on NetApp’s ONTAP file system in the AWS Cloud.

In this blog post, we walked you through how to set up an FSx for ONTAP file system, provision and optimize ONTAP volumes, set up iSCSI connections, and finally, deploy a SQL Server Always On FCI with shared iSCSI storage.


AWS can help you assess how your company can get the most out of cloud. Join the millions of AWS customers that trust us to migrate and modernize their most important applications in the cloud. To learn more on modernizing Windows Server or SQL Server, visit Windows on AWS. Contact us to start your migration journey today.

Andy Ward

Andy Ward

Andy Ward is a Senior Specialist Solutions Architect at AWS, focusing on Microsoft Workloads. Andy has been working with Microsoft technology for over 20 years and helps customers and partners with running, transforming and optimising their Microsoft Workloads on AWS.

Baris Furtinalar

Baris Furtinalar

Baris Furtinalar is a Principal Solutions Architect who is part of specialists in Microsoft architectures team at AWS. He is passionate about cloud computing and he believes this shift to the cloud helps companies transform their businesses, gain agility, and improve operational responsiveness. He comes from a diverse background including SQL database administration, virtualization and system security. He designed, implemented and supported Windows/SQL server deployments since 2000.