AWS Database Blog
Migrate on-premises SQL Server workloads to Amazon RDS Custom for SQL Server using distributed availability groups
In this post, we provide a SQL Server Always On cluster database migration pattern solution to Amazon Relational Database Service (Amazon RDS) Custom for SQL Server using distributed availability groups. This solution helps reduce the migration downtime through continuous data synchronization combined with a failover process. This post is not a high availability and disaster recovery solution; it is for migration purposes only. We recommend not scaling compute or upgrading your RDS instance to avoid host replacement issues during the migration journey.
Amazon RDS Custom is a managed database service for legacy, custom, and packaged applications that require access to the underlying OS and DB environment. Amazon RDS Custom for SQL Server automates the setup, operation, and scaling of databases in the cloud, while granting access to the database and underlying operating system to configure settings, install drivers, and enable native features to help you meet the dependent application’s requirements.
A distributed availability group is a special type of availability group that spans two separate availability groups. The availability groups that participate in a distributed availability group don’t need to be in the same location. They can be physical, virtual, on premises, in the public cloud, or anywhere that supports an availability group deployment. This includes cross-domain and even cross-platform—such as between an availability group hosted on Linux and one hosted on Windows. As long as two availability groups can communicate, you can configure a distributed availability group with them.
Solution overview
The solution typically involves 2-nodes, source (on premises) and destination (RDS Custom for SQL Server) clusters. For this post, our source and destination cluster nodes and configurations are represented as follows:
- Source cluster primary node – Global primary (GP), referred to as Node1
- Source cluster secondary node – Global secondary (GS), referred as Node2
- Source cluster availability group – Availability group primary (AG1)
- Source cluster Windows Server failover cluster – Windows Server failover cluster primary (WSFC1)
- Destination cluster primary node – Forwarder primary (FP, RDS Custom)
- Destination cluster availability group – Availability group secondary (AG2)
- Destination cluster Windows Server failover cluster – Windows Server failover cluster secondary (WSFC2)
The availability mode between AG1 nodes is synchronous-commit. The availability mode of the distributed availability group between two clusters through GP and FP is asynchronous-commit. Domain Name System (DNS) is used for communication. The failover between AG1 and AG2 through the distributed availability group is manual.
The following are the high-level steps for the solution:
- Configure Windows firewall inbound rules and VPC security group rules.
- Set up connectivity from on premises to the AWS Cloud.
- Set up and configure an on-premises Always On AG1 cluster as primary.
- Create RDS Custom for SQL Server (FP).
- Configure the RDS Custom AG2 cluster with listener.
- Create a distributed availability group between AG1 and AG2.
- Join AG2 to the distributed availability group.
- Join the database on FP (RDS Custom).
- Migrate the on-premises database to Amazon RDS Custom for SQL Server through the distributed availability group.
The following flowchart illustrates the solution workflow.
The following diagram illustrates the solution architecture.
The following diagram illustrates the solution architecture for the multi-node on-premises cluster.
Prerequisites
We assume that you have prior knowledge regarding setting up Always On availability groups. For additional information, refer to What is an Always On availability group. We use SQL Server 2019 Enterprise Edition and Windows Server 2019. In this post, we migrate a database using a distributed availability group. For additional information, refer to Distributed availability groups.
To follow this post, the following prerequisites are required:
- The AWS Command Line Interface (AWS CLI) installed and configured
- An AWS account and appropriate permissions to interact with resources in your AWS account
- Amazon Virtual Private Cloud on AWS
This solution can incur costs in your AWS account. Refer to Amazon RDS Custom for SQL Server Pricing for more information. Make sure you remove the resources when you’re done with the solution.
We recommend that you set up Always On in a non-production instance and run end-to-end validations before you implement this solution in a production environment. It’s also recommended to follow the preceding high-level steps in the same order of sequence.
Configure Windows firewall inbound rules and VPC security group rules
For successful Windows failover cluster creation and communication to happen, you must open the TCP ports 1433, 1434, 4022, 5022, 5023, and 135 at the Windows operating system level and VPC security group level.
For more information about creating inbound port rules in Windows, refer to Create an Inbound Port Rule. To learn more about security groups, refer to Control traffic to your AWS resources using security groups.
Set up connectivity from on premises to the AWS Cloud
It’s important to establish a secure connection from on premises to the AWS Cloud so both can communicate with each other. You can adopt different methodologies to achieve this, for example using AWS Transit Gateway, AWS Direct Connect, or AWS VPN.
Set up and configure an on-premises Always On AG1 cluster as primary
For this post, we create a 2-node Amazon Elastic Compute Cloud (Amazon EC2) SQL Server Always On availability group cluster (Node1 and Node2) using the following quick start to represent the on-premises cluster nodes. If your on-premises cluster is a single-node cluster with an Always On listener setup, proceed to following section. If your on-premises cluster is a single-node workgroup setup, refer to the preceding solution overview for next steps.
Create RDS Custom for SQL Server (FP)
Launch FP (RDS Custom) using the following AWS CLI script and update the parameters accordingly. For more information about how to create an RDS Custom for SQL Server DB instance, refer to Creating and connecting to a DB instance for Amazon RDS Custom for SQL Server.
Configure an RDS Custom AG2 cluster with listener
Now let’s configure the Always On cluster. Run the commands in this section in Windows PowerShell 64-bit as an administrator. Note that multiple server restarts will occur as a part of the setup process.
Install the failover clustering feature on FP (RDS Custom)
Run the following command on FP (RDS Custom) to install the Windows failover clustering feature:
Configure the DNS suffix list
Run the following commands on FP (RDS Custom) to append the existing DNS suffix list:
Configure the DNS server for FP (RDS Custom)
Run the following commands on FP (RDS Custom) to set preferred and alternate DNS server addresses.
After you run the Get-DnsClientServerAddress
command, get the InterfaceAlias
name of the instance (in this example, Ethernet 2 is the InterfaceAlias
name) and pass that in the Set-DNSClientServerAddress
command. Additionally, pass the preferred and alternate IP address of the Active Directory server:
The following is an example screenshot.
Join FP (RDS Custom) to the on-premises (WSFC1) domain
We join the FP (RDS Custom) to the on-premises domain, which is onprem.com in this example.
Run the following command on FP (RDS Custom) and restart the server. Make sure to persist this change. Refer to Persist your OS-level customization within Amazon RDS Custom for SQL Server using Custom Engine Version (CEV) for more information, in the event of host replacement/scale compute.
Run the following commands on FP (RDS Custom) to verify the domain name:
Configure the SQL Server log on as a service using a domain user on FP (RDS Custom)
Change the SQL Server service account’s log on as a service to a domain user on FP (RDS Custom) using the following PowerShell code (change the parameters accordingly). Note that this will restart the SQL services automatically. If the PowerShell command isn’t completing, restart the SQL services manually and rerun the command.
Run the following commands on FP (RDS Custom) to verify the domain user name on the SQL Server service account:
Add the SQL Server service account to the local administrator group
Add the SQL Server service account (domain user) to the local administrator’s group. Run the following command on FP (RDS Custom):
Run the following commands on FP (RDS Custom) to verify the domain user is part of the local admin group:
Add secondary IPs on the cluster node elastic network interface
Run the following command on FP (RDS Custom) to add the cluster IP address and SQL Server Always On listener IP address, in the secondary IPs of the corresponding cluster node’s elastic network interface (ENI). Make sure to pick an unused IP address before adding the ENI. For additional information, refer to Multiple IP addresses
Create an Always On availability group AG2 with listener on FP (RDS Custom)
Now let’s create cluster and configure Always On availability group AG2 on FP (RDS Custom).
Create WSFC2
Run the following commands as the domain admin and PowerShell admin to create WSFC2 in PowerShell on FP (RDS Custom) by providing the cluster name, node name, and static IP:
Enable the Always On availability group feature on FP (RDS Custom)
Run the following commands on FP (RDS Custom), changing the parameters accordingly. Note that the third command restarts SQL services. Additionally, PowerShell may wait longer to complete the command. In that case, restart the SQL services manually to move forward.
Set up an endpoint and login in FP (RDS Custom)
Run the following TSQL commands on FP (RDS Custom) to create a cluster endpoint:
Run the following TSQL on FP (RDS Custom) to add the SQL Server service account as login and grant sysadmin privilege:
Explicitly grant connect privilege on the endpoint to the SQL Server service account using the following command:
Create AG2
Run the following TSQL on FP (RDS Custom):
Create an AG2 listener
Make sure to pre-stage the listener object ahead of time in the active directory and provide the Windows cluster name object$ (in this example, WSFC2$). Create a computer objects permission explicitly for the listener object to be created successfully. For additional information, refer to Configure a listener for an Always On availability group.
Run the following TSQL on FP (RDS Custom):
Create a distributed availability group between AG1 and AG2
On GP (Node1), run the following command to create a distributed availability group. The file system directory structure of the database files that is created on GS (Node2), and FP (RDS Custom) should be identical to GP (Node1) in order for automatic seeding to work. If they’re different, you must manually prepare the secondary database. For more information, see Prepare a secondary database for an Always On availability group.
Note that for FP (RDS Custom), all SQL Server database files are stored on the D: drive by default, in the D:\rdsdbdata\DATA
directory. If you create or alter the database file location to be anywhere other than the D: drive, then Amazon RDS Custom places the DB instance outside the support perimeter. For details, refer to Troubleshooting DB issues for Amazon RDS Custom.
Modify the AVAILABILITY GROUP
and LISTENER_URL
values according to your environment:
Join AG2 to the distributed availability group
On FP, run the following to join AG2 to the distributed availability group:
Join the database on FP (RDS Custom)
Manually prepare the database on FP (RDS Custom) with no recovery. You need to manually join the database to the availability group. Note that the database <database_name> is in AG1 before joining the database on FP (RDS Custom).
On FP (RDS Custom), run the following TSQL:
Migrate the on-premises database to Amazon RDS Custom for SQL Server through the distributed availability group
After you create the distributed availability group, you now have AG1 (on premises) replicating data to AG2 (AWS).
In a typical migration scenario, you set this up and let it run until your actual cutover day. When you’re ready for the migration, complete the following steps:
-
- Run the following TSQL on GP (Node1) to understand the lag of FP (RDS Custom) to GP (Node1):
- Stop all incoming application traffic to GP (Node1) so no write activities are occurring.
- On both GP (Node1) and FP (RDS Custom), run the following code to set the distributed availability group to
SYNCHRONOUS_COMMIT
: - Wait until the status of the distributed availability group has changed to
SYNCHRONIZED
and all replicas have the samelast_hardened_lsn
(per database). - Rerun the TSQL on GP (Node1) and FP (RDS Custom) from Step 1 to ensure
synchronization_state_desc
andlast_hardened_lsn
are up to date. - On GP (Node1), set the role to
SECONDARY
:
At this point, the distributed availability group is not available.
- On FP (RDS Custom), run the following to initiate the failover:
GP (Node1) and FP (RDS Custom) have swapped roles now. GP has become FP, and FP is the new GP. It is recommended to change the availability mode of AG1 and AG2 back to ASYNCHRONOUS_COMMIT
to avoid latency issues.
Post-migration
In case of scale storage or degraded hardware, AWS will replace the old host with a new host and our automation will configure the SQL Server service account back with the NT Services account. If this happens, SMK can’t be automatically decrypted because it was encrypted by a different user (domain user). We recommend reverting the SQL Server service account to the NT Services account so that nothing will break after a degraded hardware host replacement or scale storage.
Rollback plan
To roll back the database and bring the on-premises cluster as primary, complete the following steps:
- Run the following on the new GP (old FP) to set the availability mode to
SYNCHRONOUS_COMMIT
: - On GP (old FP), set the role to
SECONDARY
: - On FP (RDS Custom, old GP), run the following to initiate the failover:
Remove the on-premises dependencies
You can monitor your application and database performance for a period of time to ensure stable performance. When you’re ready to run independently on AG2 (the RDS Custom for SQL Server cluster), remove the on-premises dependencies.
- On FP (RDS Custom, old GP), run the following command to stop data synchronization:
At this point, AG2’s databases are online and AG1’s databases are in restoring state.
- Drop the distributed availability group. It’s recommended to run the following code on GP (old FP) first, followed by FP (old GP):
Decommission the on-premises cluster
Now that you have removed the on-premises dependencies from AG2, you can decommission the on-premises database, instance and server accordingly.
Clean up
To avoid incurring future charges, delete the resources you created as part of this post:
- To delete the EC2 instance, refer to Terminate your instance.
- To delete the DB instance, refer to Deleting a DB instance.
Summary
SQL Server Always On availability groups provide a cost-effective high availability and disaster recovery feature. A distributed availability group is a special type of availability group that enables you to migrate cluster databases spanning across two or more separate availability groups.
In this post, we combined the simplicity of Always On availability groups and the flexibility of distributed availability groups and demonstrated a solution for migrating your on-premises SQL Server cluster to an RDS Custom for SQL Server cluster.
Try out the solution and if you have any comments or questions, leave them in the comments section. Stay tuned for further posts on more SQL Server migration solutions.
About the Authors
Aravind Hariharaputran is Database Consultant with the Professional Services team at Amazon Web Services. He is passionate about databases in general with Microsoft SQL Server as his specialty. He helps build technical solutions that assist customers to migrate and optimize their on-premises database workload to the AWS Cloud. He enjoys spending time with family and playing cricket.
Jeril Jose is Database Specialist Consultant with over 14 years of experience in Microsoft SQL Server and other database technologies. He helps customers architect, migrate and optimize their database solutions to AWS. Prior to joining AWS, he supported production and mission-critical database implementation across financial and retail segments.
Brett Haralson is a Database Consultant with Professional Services team at Amazon Web Services with a specialization in SQL Server. He helps customers to build high-available, cost-effective database solutions and migrate their large-scale SQL Server databases to AWS.