AWS Storage Blog

Simplify your Microsoft SQL Server high availability deployments using Amazon FSx for Windows File Server

Customers have been running their Windows workloads on AWS for over a decade. Running business-critical applications often requires Microsoft SQL Server databases to be highly available. The high availability requirement demands that there are no single points of failure in the SQL Server deployment architecture. With a high availability setup, you have a cluster of two nodes with a primary and a secondary. If the primary fails, the cluster automatically fails over to the secondary, thus ensuring continued database activity for your applications. Microsoft SQL Server provides a set of features under the umbrella of SQL Server Always On to enable such redundant, high availability deployments.

SQL Server Always On (specifically, the Failover Cluster Instance deployment option, which is the preferred deployment option for high availability) has been traditionally difficult to deploy and manage. This difficulty stems from it requiring shared storage to enable the secondary node to continue to have access to the database storage upon a failover, without the need to replicate data between the nodes.

We recently introduced new features on Amazon FSx for Windows File Server to simplify deploying and using shared storage to host your databases. These features reduce complexity and cost of running SQL Server Always On deployments. We introduced native support for multi-Availability Zone file system deployments on Amazon FSx for Windows File Server (Amazon FSx). Native support makes it easy to deploy Windows file storage in AWS with high availability and redundancy across multiple Availability Zones. Along with this launch, we also introduced support for SMB Continuously Available (CA) file shares. This new capability enables you to simplify your SQL Server Always On deployments by using fully managed, highly available shared storage on Amazon FSx to host your databases.

In this blog post, we first provide a brief background on the deployment options for SQL Server Always On. Then we talk about why the preferred option for high availability deployments has traditionally been difficult to deploy and manage. Walking you through step-by-step how to do so, we then demonstrate how using Amazon FSx to run SQL Server Always On deployments helps you reduce complexity and cost.

SQL Server Always On overview

Microsoft SQL Server provides two deployment options of its Always On solution for business continuity use cases like high availability and disaster recovery: Always On Failover Cluster Instances (FCI) and Always On Availability Groups (AG). Both these deployment options use Windows Server Failover Cluster (WSFC) technology to provide a cluster of independent nodes that work together to deliver high availability, but there are important differences:

  • FCIs present a single SQL Server instance that is installed across nodes in a WSFC to provide high availability for the entire installation of SQL Server. This means that everything inside the SQL Server instance moves to another node in the cluster should the underlying node encounter a problem. Among other things, system databases, SQL Server logins, SQL Server Agent jobs, and certificates get moved to another node. FCIs require some form of shared storage – either shared block storage (SAN) or shared file storage (accessed via SMB).
  • AGs represent one or more user databases that fail over together across multiple replicas hosted in the cluster to provide database-level high availability. An AG consists of a primary replica and one or more secondary replicas that are maintained through SQL Server log-based data movement for data protection without the need for shared storage. Each replica is hosted by an instance of SQL Server on a different node of the WSFC, and each replica has its own local storage independent of the AG.

Why use Always On FCI?

FCI is generally preferable over AG for SQL Server high availability deployments when the following are priority concerns for your use case:

  • License cost efficiency: You need the Enterprise Edition license of SQL Server to run AGs, whereas you only need the Standard Edition license to run FCIs. This is typically 50–60% less expensive than the Enterprise Edition. Although you can run a Basic version of AGs on Standard Edition starting from SQL Server 2016, it carries the limitation of supporting only one database per AG. This can become a challenge when dealing with applications that require multiple databases like SharePoint.
  • Instance-level protection versus database-level protection: With FCI, the entire instance is protected – if the primary node becomes unavailable, the entire instance is moved to the standby node. This takes care of the SQL Server logins, SQL Server Agent jobs, certificates, etc. that are stored in the system databases, which are physically stored in shared storage. With AG, on the other hand, only the databases in the group are protected, and system databases cannot be added to an AG – only user databases are allowed. It is the database administrator’s responsibility to replicate changes to system objects on all AG replicas. This leaves the possibility of human error causing the database to become inaccessible to the application.
  • DTC feature support: If you’re using SQL Server 2012 or 2014, and your application uses Distributed Transaction Coordinator (DTC), you are not able to use an AG as it is not supported. Use FCI in this situation.

That said, there are scenarios other than high availability where AG is preferable over FCI. For example, if you must configure the additional nodes as read replicas to enable higher read performance scalability.

What about the shared storage requirement?

While FCI is preferable to AG in many scenarios, it has traditionally been difficult to deploy and manage. This is because it requires shared storage to enable the secondary node to continue to have access to the database storage upon a failover without the need to replicate data between the nodes. From a high availability point-of-view, the shared storage must be resilient and highly available. Otherwise, it becomes a single point of failure because the FCI remains offline if the shared storage becomes unavailable, regardless of the number of nodes in the failover cluster. Until recently, you needed dedicated administrators responsible for the complexities of architecting and maintaining a robust shared storage solution that provides high availability and resiliency. Additionally, you typically paid for third-party storage replication software solutions (like SIOS DataKeeper and StarWind Virtual SAN) over and above the storage infrastructure costs.

Reduce complexity and cost using Amazon FSx

With Amazon FSx, you get fully managed shared file storage that automatically replicates the storage synchronously across two Availability Zones. Moreover, Amazon FSx provides high availability with automatic failure detection, failover, and failback. The service also fully supports the SMB Continuous Availability (CA) feature required to support SQL Server Always On FCI deployments.

For those of you who chose to deploy SQL AG with Enterprise Edition licenses, to avoid the complexity and cost of using FCI with shared storage, now you can use Standard Edition licenses. This will save you 50–60% in license costs. It will also simplify the overall complexity of your SQL deployment and ongoing management (like not needing to replicate system database objects across all replicas, which you must do in AG deployments).

If you use SQL Server FCI with shared storage using a third-party storage replication software solution, you must purchase a license for the storage replication solution, and then deploy, administer, and maintain the shared storage solution yourself. You now have the option of using a fully managed shared storage solution with Amazon FSx to simplify your SQL Server FCI deployment.

If you are running your SQL Server Always On deployment on-premises, you are likely using a combination of FCI and AG. FCI to provide high availability within your primary data center site (because shared storage cannot typically span multiple data centers), and AG to provide a disaster recovery solution across sites. AWS’s Availability Zone architecture and Amazon FSx’s support for highly available shared storage deployed across multiple Availability Zones now make it possible for you to eliminate the need for separate high availability and disaster recovery solutions. This reduces costs as well and simplifies deployment complexities.

Finally, another way you can use Amazon FSx to simplify your SQL Server FCI deployment is for the Windows file share witness for your SQL cluster. A Windows file share witness is a file share that is available to all nodes in a high availability cluster. The job of the witness is to provide an additional quorum vote when necessary to ensure that a cluster continues to run if there is a site outage. With Amazon FSx, you can easily and cost-effectively use a Single-AZ file system that is fully managed to host the file share witness.

Architecture

Amazon FSx provides a file share with the following capabilities to host SQL Databases:

  • Support for SMB versions 2.0 through 3.1.1
  • Encryption in transit using SMB Kerberos
  • Customizable performance (throughput and IOPS)
  • Multi-AZ deployment (HA) with support for SMB Continuous Availability (also known as SMB Transparent Failover)
  • Predictable cost with no charge for Inter-AZ data replication
  • Single-AZ file system can be used for file share witness when creating a Windows Failover cluster or as a target for SQL backups. The minimum size of 32 GB should be sufficient for the file share witness.

When using Amazon FSx as the storage solution for a SQL failover cluster instance, the targeted architecture should be as follows:

When using Amazon FSx as the storage solution for a SQL failover cluster instance, the targeted architecture should be as follows

Amazon FSx file systems should be deployed in the same Availability Zones as the SQL Server FCI cluster nodes. Additionally, the file share witness should be deployed into a third Availability Zone in order to keep the vote majority if there is a loss of one Availability Zone. For identity, Amazon FSx supports the use of either an AWS Managed Microsoft Active Directory or a self-managed Microsoft Active Directory.

Note: We are using AWS Managed Microsoft Active Directory with Amazon FSx in this walk through, and are not covering the self-managed Active Directory case. For more information about how to use your self-managed Active Directory with Amazon FSx, see Using Amazon FSx for Windows File Server with an on-premises Active Directory. Also, we are not covering the deployment of Active Directory in this article and assume it is already deployed. For more information, see Getting Started with AWS Managed Microsoft Active Directory. Finally, we are not covering the creation of the SQL Server cluster itself. For more information, see SQL Server with WSFC on AWS.

How to configure Amazon FSx as shared storage for Microsoft SQL Server

The first step is to create the Multi-AZ Amazon FSx file system joined to your domain. It is key to properly size your Amazon FSx file system. The storage capacity determines not only how much data you can store, but also how many I/O operations per second (IOPs) you can perform – each GB of storage provides three IOPs. The throughput capacity of your file system determines the speed at which the file system can serve file data to the SQL Server instances accessing it. As an example, in our testing we used a 6000-GB storage capacity (to deliver a maximum of 18000 IOPS) along with a throughput capacity level of 128 MBps.

As an example, in our testing we used a 6000-GB storage capacity (to deliver a maximum of 18000 IOPS) along with a throughput capacity level of 128 MBps

The next step is to set the right network settings. Make sure you deploy the Amazon FSx file system in the same two Availability Zones as those associated with your SQL Server cluster.

The next step is to set the right network settings

Select your Active Directory – in our case it is an AWS Managed Microsoft Active Directory. Amazon FSx also supports using an AWS Microsoft Active Directory in a different VPC or account via directory sharing.

Select your Active Directory

Then select Create file system.

Create the Microsoft SQL Server file share on your file system

The second step is to create a file share on your file system and set proper permissions and availability settings.

First, on the Amazon FSx Console, select your file system and identify the DNS name for your file system – amznfsxmzqnv8tv.SQLonFSx.AWS in our example. Then map the D$ share on your file system using its DNS name (\\amznfsxmzqnv8tv.SQLonFSx.AWS\D$ in our example), and create a folder called SQLDB within it.

Next, on the Amazon FSx Console, select your file system and identify the Windows Remote PowerShell Endpoint name for your file system. Run the command that follows to create the file share. We are setting the ContinuouslyAvailable property of the file share to True below – this is the shared storage feature required to ensure high availability for your SQL Server deployment. Make sure you properly edit the command with your own values before running it.

On the Amazon FSx Console, select your file system and identify the Windows Remote PowerShell Endpoint name for your file system

$FSX = "amznfsxipf5ovad.SQLonFSx.AWS"
Invoke-Command -ComputerName $FSX -ConfigurationName FSxRemoteAdmin -scriptblock {
New-FSxSmbShare -Name "SQLDB" -Path "D:\SQLDB" -Description "SQL Databases Share" -ContinuouslyAvailable $true -FolderEnumerationMode AccessBased -EncryptData $true
grant-fsxsmbshareaccess -name SQLDB -AccountName "sqlonfsx.aws\SQLSA","sqlonfsx.aws\DBAdmins","sqlonfsx.aws\SQLServers" -accessRight Full
}

Replace $FSx with your file system’s Windows Remote PowerShell Endpoint and modify the account names. In our example we use three identities: the SQL service account, a global group for DBAs, and another Global Group in which all SQL Servers are members.

Output:

the output of your commands

Note: If you don’t use a SQL Service account but instead use system account, you must add all SQL system accounts named. This includes the Virtual Computer Object (VCO) of the SQL cluster as Domain\system$.

NTFS permissions

The previous step only set permissions at the share-level. We now must set file-level permissions.

On the Amazon FSx Console, navigate to your file system’s details screen and identify the DNS name of your file system.

On the Amazon FSx Console, navigate to your file system’s details screen and identify the DNS name of your file system

Next, on an EC2 Windows instance that has connectivity to your file system, go to File Explorer and enter the following UNC path:

\\DNSFSX\d$

Then right click on the SQLDB folder and edit the NTFS permissions. Full Control is granted to the three identities mentioned above.

Then right click on the SQLDB folder and edit the NTFS permissions

We can as an alternative use a single script to create the share and set proper NTFS permissions.

For example, in our case we would use the following:

## Variables

$FSX = "amznfsxuj4obt90.SQLonFSx.AWS" ## Amazon FSx DNS Name
$FSxPS = "amznfsxcvcez6o7.SQLonFSx.AWS" # Amazon FSx PowerShell endpoint

New-Item -ItemType Directory -Name SQLDB -Path \\$FSX\D$\

## Set NTFS Permissions
$ACL = Get-Acl \\$FSx\D$\SQLDB
$Ar = New-Object system.security.accesscontrol.filesystemaccessrule("sqlonfsx.aws\DBAdmins","FullControl",”ContainerInherit", "ObjectInherit", "None", "Allow")
$ACL.SetAccessRule($Ar)
Set-Acl \\$FSX\D$\SQLDB $ACL

$ACL = Get-Acl \\$FSx\D$\SQLDB
$Ar = New-Object system.security.accesscontrol.filesystemaccessrule("sqlonfsx.aws\SQLSA","FullControl","ContainerInherit, "ObjectInherit", "None", "Allow")
$ACL.SetAccessRule($Ar)
Set-Acl \\$FSX\D$\SQLDB $ACL

$ACL = Get-Acl \\$FSx\D$\SQLDB
$Ar = New-Object system.security.accesscontrol.filesystemaccessrule("sqlonfsx.aws\SQLServers","FullControl","ContainerInherit, "ObjectInherit", "None", "Allow")
$ACL.SetAccessRule($Ar)
Set-Acl \\$FSX\D$\SQLDB $ACL

## Create share and set share permissions
Invoke-Command -ComputerName $FSxPS -ConfigurationName FSxRemoteAdmin -scriptblock {
    New-FSxSmbShare -Name "SQLDB" -Path "D:\SQLDB" -Description "SQL Database Share" -ContinuouslyAvailable $True -FolderEnumerationMode AccessBased -EncryptData $True 
    Grant-FSxSmbShareaccess -name SQLDB -AccountName "sqlonfsx.aws\SQLSA","sqlonfsx.aws\DBAdmins","sqlonfsx.aws\SQLServers" -accessRight Full
}

Install SQL Server as failover cluster instance

From this point on, the installation of SQL on top of the Windows Server Failover Cluster (WSFC) is exactly the same as any SQL FCI installation relying on shared file storage. On the database path setup, the path specified to the SQL share must be UNC pathname of the form \\FileSystemDNS\Share (Microsoft SQL Server does not support local loopback such as \\localhost, or a mapped drive.).

In our example:

On the database path setup, the path specified to the SQL share must be UNC pathname

It is also key to ensure that proper permissions have been given to the service account, which are specified right before configuring the database paths.

It is also key to ensure that proper permissions have been given to the service account, which are specified right before configuring the database paths

To complete deployment of your SQL FCI cluster, repeat this step on the second node. After completing this step, you should have a SQL Server Always On Failover Cluster Instance with the database(s) hosted on Amazon FSx.

Cleaning up

After finishing the proof of concept, clean up the resources by deleting the two Amazon FSx file systems (one hosting the database shared storage for the SQL FCI cluster, and the other hosting the witness share), the AWS Managed Active Directory, and the SQL Server cluster instances. Doing so should ensure you incur no forthcoming costs from the resources used in this proof of concept.

Summary

In this blog post, we demonstrated how you can reduce complexity and cost of running SQL Server FCI deployments using Amazon FSx. We walked through the step-by-step process of deploying SQL Server FCI using an Amazon FSx Multi-AZ file system as the shared storage solution.

Here are the takeaways for how Amazon FSx helps you reduce complexity and cost of running a SQL FCI deployment:

  • If you are using SQL Server’s Enterprise Edition license, you can switch to the Standard Edition and save significantly on license costs. This also simplifies the overall complexity of your SQL deployment and ongoing management.
  • If you are using FCI with a shared storage solution that you are administering and maintaining using a third-party storage replication software solution, you have the option to use a fully managed shared storage solution with Amazon FSx to simplify your SQL FCI deployment.
  • If you are running a combination of FCI and AG on-premises to achieve both high availability and disaster recovery, you have the option to move your deployment to a SQL FCI deployment on AWS. This eliminates separate HA and DR solutions, further reducing costs as well as simplifying deployment complexities.
  • You can now use Amazon FSx to host the file share witness for your SQL cluster.

As always, thanks for reading this post. Please do not hesitate to leave a comment with any questions or feedback you may have!

David Twizer

David Twizer

David (Dudu) Twizer is a Sr. Solutions Architect in AWS, he mainly works with AWS customers to provide guidance and technical assistance on running their .Net & Microsoft workloads on AWS. He is also an active contributor to AWS service teams (AWS Systems Manager, Amazon EC2, AWS Directory Service, Amazon RDS, Amazon FSx for Windows), helping to continuously improve the experience on AWS.

Prash Bungale

Prash Bungale

Prashanth Bungale is a Product Manager at Amazon Web Services, where he has been working for 5 years. He heads product management for Amazon FSx for Windows File Server, and was a member of the Amazon EFS product team before that. Prior to AWS, he worked for Harvard University and Intel as a research scientist, and for VMware as a software engineer.

Alain Vetier

Alain Vetier

Alain Vetier is currently a Senior Microsoft Tech Specialized Solutions Architect at AWS. He has more than 15 years of experience in the IT Industry, including 12 years focusing on hybrid and cloud infrastructure. Today he is helping enterprise customers to migrate or build their solution on AWS. Using customer feedback, Alain works with different service teams to make AWS the best platform to run Microsoft workloads.