AWS Database Blog

Migrate an on-premises SQL Server standalone workload to Amazon RDS Custom for SQL Server using domain-independent, Always On availability groups

There are different strategies you can leverage to migrate an on-premises SQL Server workload to Amazon Relational Database Service (Amazon RDS) Custom for SQL Server. All of them come with pros and cons.

The following are some high-level migration challenges that you may encounter during your analysis and implementation phase:

  • Lift and shift (backup and restore) is efficient for moving an on-premises workload ‘as is’ to AWS, but has more downtime during the cutover.
  • AWS Database Migration Service (AWS DMS) helps you migrate databases to AWS quickly and securely. You can also continuously replicate data with low latency from any supported source to any supported target. However, it has some limitations if you need to move the workload as is to the cloud. For more details , refer to Limitations on using SQL Server as a source for AWS DMS.
  • AWS Snowball moves terabytes of data in about a week. You can use it to move things like databases, backups, archives, healthcare records, analytics datasets, IoT sensor data, and media content, especially when network conditions prevent realistic timelines for transferring large amounts of data both into and out of AWS. AWS Snowball is designed to handle large volume workloads so if your requirement isn’t as large as multiple terabytes then one of the other solutions may be a better fit.

In this post, we provide you a database migration pattern solution for your on-premises SQL Server standalone workload to Amazon RDS Custom, using a domain-independent Always On availability group.

Amazon RDS Custom is a managed database service for legacy, custom, and packaged applications that require access to the underlying OS and DB environment. Amazon RDS Custom for SQL Server automates the setup, operation, and scaling of databases in the cloud, while granting access to the database and underlying operating system to configure settings, install drivers, and enable native features to help you meet the dependent application’s requirements.

The Always On availability group provides a high availability and disaster recovery solution with an enterprise-level alternative to database mirroring. An availability group supports a replicated environment for a discrete set of user databases, known as availability databases, that can fail over together.

Solution overview

This migration pattern typically involves source and destination replica servers. For this post, our source is the on-premises standalone or workgroup server as the primary replica (Node1). Our destination is an RDS Custom for SQL Server database instance as the secondary replica (Node2). DNS configuration is done within the server (hosts file). The following diagram illustrates the solution architecture.

The following are the high-level steps for the solution:

  1. Create a primary and secondary replica
  2. Configure Windows firewall inbound rules/VPC security group rules
  3. Set up connectivity from on-premises to the AWS Cloud
  4. Configure the domain-independent Windows failover cluster
  5. Set up the prerequisites for an Always On an availability group
  6. Create the Always On availability group
  7. Migrate the on-premises database to Amazon RDS Custom for SQL Server

Prerequisites

We assume that you have prior knowledge regarding setting up Always On availability groups.
In this post, we configure a domain-independent availability group to perform a migration. For additional information, refer to What is an Always On availability group and Create a domain-independent availability group. We use SQL Server 2019 Enterprise Edition and Windows Server 2019.

In order to follow this post, the following prerequisites are required:

This solution can incur costs in your AWS Account. Refer to the RDS pricing page for more information. Make sure you remove the resources once you are done with the solution.

We strongly recommend that you set up Always On in a non-production instance and run end-to-end validations before you implement this solution in a production environment. It’s also recommended to follow the preceding high-level steps in the same order of sequence.

Create a primary and secondary replica

For this post, we create an Amazon Elastic Compute Cloud (Amazon EC2) SQL Server instance to represent the on-premises standalone primary replica (Node1).

Take note of your source IP address. In our case, Node1 uses the private IP 10.0.23.99.

Launch Node2, using the following AWS CLI script, and update the parameters accordingly. For more information about how to create an RDS Custom for SQL Server DB instance, refer to Creating and connecting to a DB instance for Amazon RDS Custom for SQL Server.

 aws rds create-db-instance \
    --engine custom-sqlserver-ee \
    --engine-version 15.00.4073.23.v1 \
    --db-instance-identifier <XXXXX> \
    --db-instance-class db.m5.xlarge \
    --db-subnet-group rds-custom-secondary \
    --master-username admin\
    --master-user-password <XXXXX>\
    --backup-retention-period 7 \
    --port 1433 \
    --kms-key-id <kms-key-id> \
    --custom-iam-instance-profile RDSCustomIAMProfile \
    --vpc-security-group-ids security-group \
    --publicly-accessible \
    --region us-east-2

Take note of your destination IP address. In our case, Node2 uses the private IP 172.31.154.104.

Configure Windows firewall inbound rules and VPC security group rules

For successful Windows failover cluster creation and communication to happen, you must open the following ports at the Windows operating system level and VPC security group level.

  • TCP ports 1433, 1434, 4022, 5022, 5023, and 135
  • UDP ports 3343 and 137. For more information about creating inbound port rules in Windows, refer to Create an Inbound Port Rule. To learn more about controlling traffic through security groups, refer to Security group rules.

Set up connectivity from on-premises to the AWS cloud

It’s very important to establish a secure connection from on-premises to the AWS cloud so both can communicate with each other. You can adopt different methodologies to achieve this, for example using AWS Transit Gateway, AWS Direct Connect, or AWS VPN.

Configure a domain-independent Windows failover cluster

Now let’s configure our domain-independent Windows failover cluster. Run the commands in this section in Windows PowerShell 64-bit as an administrator. Note that multiple server restarts will occur as a part of the setup process.

Install the failover clustering feature in all replica servers

Run the following command on both nodes to install the Windows failover clustering feature:

Install-WindowsFeature -Name Failover-Clustering –IncludeManagementTools

Assign a primary DNS suffix to all replica servers

Run the following command on Node1. For this example, we assign the DNS suffix as “OnPremSource.com”.

Set-ItemProperty "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\" -Name "NV Domain" -Value <OnPremSource.com>

Run the following command on Node2 . For this example, we assign the DNS suffix as “AWSDestination.com”.

Set-ItemProperty "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\" -Name "NV Domain" -Value <AWSDestination.com>

Update the host file on each replica servers

On both nodes, navigate to C:\Windows\System32\Drives\etc\hosts. Edit the Hosts file manually to add the IP address, host name, and FQDN entry at the end of the file. Restart the nodes for the changes to take effect.

The following table summarizes the parameter details.

. Node_type IP Address Domain Name Hostname Fully Qualified Domain Name
Node1 Amazon EC2/
on-premises
10.0.23.99 OnPremSource.com EC2AMAZ-C1B4GVO EC2AMAZ-C1B4GVO.OnPremSource.com
Node2 RDS Custom for SQL Server 172.31.154.104 AWSDestination.com RDSAMAZ-KLIN0HM RDSAMAZ-KLIN0HM.AWSDestination.com

Use the following format:

  • <IP address of node1> <node1domainname>
  • <IP address of node1> <node1hostname.domainname>
  • <IP address of node2> <node2domainname>
  • <IP address of node2> <node2hostname.domainname>

The following screenshot is an example from Node1.

Alternatively, run the following PowerShell (change the parameters accordingly) to perform the same actions:

$content= “
<IP address of node1> <node1domainname> <IP address of node1> <node1hostname.domainname> <IP address of node2> <node2domainname> <IP address of node2> <node2hostname.domainname>
Add-Content -Path 'C:\Windows\System32\drivers\etc\hosts' $content

Restart-Computer

The following is an example command using our Node1 and Node2 configuration.

$content= “
10.0.23.99 OnPremSource.com
10.0.23.99 EC2AMAZ-C1B4GVO.OnPremSource.com
172.31.154.104 AWSDestination.com
172.31.154.104 RDSAMAZ-KLIN0HM.AWSDestination.com“
Add-Content -Path 'C:\Windows\System32\drivers\etc\hosts' $content

Restart-Computer

Now try to Test-NetConnection between the nodes on port 1433 to make sure both servers have network connectivity. If you encounter an issue, open the Windows firewall’s inbound and outbound rules and review your configuration.

Run the following command on Node1 to test port connectivity to Node2.

Test-NetConnection -port 1433 -ComputerName <AWSDestination.com>

Run the following command on Node2 to test port connectivity to Node1.

Test-NetConnection -port 1433 -ComputerName <on-premisessource.com>

Set up local admin user account on all replica servers

Create a local Windows user and add that user to the administrator’s group. Create a similar user on the other node with the same password. Run the following commands on both nodes:

$Password = Read-Host -AsSecureString
New-LocalUser <”user-name”> -Password $Password -FullName <”full-name”> -Description <"description of username">
Add-LocalGroupMember -Group "Administrators" -Member <”user-name”>

Configure the LocalAccountTokenFilterPolicy registry setting

Run the following command on both nodes to configure the LocalAccountTokenFilterPolicy value to 1. This setting is required because we use a local user account during the failover cluster creation and not the domain account.

Set-ItemProperty -Path HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System -Name LocalAccountTokenFilterPolicy -Value 1

Configure the DNS suffix list

Run the following commands on both the nodes to append the existing DNS suffix list.

$dnsCGSetting = Get-DnsClientGlobalSetting
$dnsCGSetting.SuffixSearchList += "<node1domainname>","<node2domainname>"

set-DnsClientGlobalSetting -SuffixSearchList $dnsCGSetting.SuffixSearchList
Clear-DnsClientCache

Configure a domain-independent Windows failover cluster

Run the following command on Node1 as the new user that you created to create a domain-independent Windows failover cluster. If possible, RDP as the new user and run the command.

New-Cluster -Name <clustername> -Node <ip address of node1>, <ip address of node2>  -AdministrativeAccessPoint DNS -NoStorage -StaticAddress <ipaddress for on-premisessource.com,ipaddress for AWSDestination.com>

Set up the prerequisites for an Always On availability group

Let’s configure our SQL Server Always On availability group, which is a robust high availability and disaster recovery solution in SQL Server.

Configure the SQL Server Log on as a service using a local admin user on both replica servers

Change the SQL Server service account’s log-on as a service to the local admin user you created in the preceding section on both nodes using the following PowerShell code (change the parameters accordingly). Note that this will restart the SQL services automatically. In this example, we use the testing user.

[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SqlWmiManagement')| Out-Null
$smowmi = New-Object Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer "<hostname>"
$wmisvc = $smowmi.Services | Where-Object {$_.Name -eq "SQLSERVERAGENT" -or $_.Name -eq "MSSQLSERVER"}
$wmisvc.SetServiceAccount('<hostname>\testing','xxx')

You can also do this in a graphical user interface using the services.msc console.

Enable the Always On availability group feature on both replica servers

Run the following commands on both nodes, changing the parameters accordingly. Note that the third command restarts SQL services:

Import-Module "SQLPS"
Enable-SqlAlwaysOn -ServerInstance <"hostname"> -Force
get-service -Name MSSQLSERVER,SQLSERVERAGENT|Restart-Service -Force

Create the login, master key, certificate, and endpoint on both replica servers

Turn on SQL authentication on both instances and restart them if you haven’t already. For information about changing authentication mode, refer to Change server authentication mode. Run the following TSQL statements in SQL Server Management Studio on the corresponding node’s mentioned. Change the directory locations according to your environment.

On Node1, run the following code to set up the certificate:

-- master key creation
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'XXX';

-- Certificate creation 
CREATE CERTIFICATE [cert_node1] WITH SUBJECT = 'CERT_SUB';

-- Certificate backup 
BACKUP CERTIFICATE [cert_node1]
  TO FILE = 'C:\TESTING\CERT_node1.CER'
        
-- Endpoint creation  
CREATE ENDPOINT [HADR] STATE = STARTED AS TCP (LISTENER_PORT = 5022 ) FOR DATA_MIRRORING ( ROLE = ALL, AUTHENTICATION = CERTIFICATE [cert_node1], ENCRYPTION = REQUIRED ALGORITHM AES);

On Node2, run the following code to set up the certificate:

-- master key creation*
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'XXX';

-- Certificate creation 
CREATE CERTIFICATE [cert_node2] WITH SUBJECT = 'CERT_SUB';

--certificate backup 
BACKUP CERTIFICATE [cert_node2]
  TO FILE = 'C:\TESTING\CERT_node2.CER'
        
--endpoint creation  
CREATE ENDPOINT [HADR] STATE = STARTED AS TCP (LISTENER_PORT = 5022 ) FOR DATA_MIRRORING ( ROLE = ALL, AUTHENTICATION = CERTIFICATE [cert_node2], ENCRYPTION = REQUIRED ALGORITHM AES);

Copy the certificates between nodes either manually or by running the following Windows PowerShell commands on Node1:

Copy-Item -Path \\EC2AMAZ-C1B4GVO\c$\testing\cert_node1.cer -Destination \\RDSAMAZ-KLIN0HM\c$\testing\
Copy-Item -Path \\RDSAMAZ-KLIN0HM\c$\testing\cert_node2.cer -Destination \\EC2AMAZ-C1B4GVO\c$\testing\

On Node1, run the following code:

-- Login creation
 use master;

CREATE LOGIN [aon_node2] WITH PASSWORD = 'XXX';
CREATE USER [aon_node2] FOR LOGIN [aon_node2];

-- Restore Certificate from node2
CREATE CERTIFICATE [cert_node2]
AUTHORIZATION aon_node2 FROM FILE = 'C:\TESTING\CERT_node2.CER'

GRANT CONNECT ON ENDPOINT::[HADR] TO [aon_node2];

On Node2, run the following code:

-- Login creation
 use master;

CREATE LOGIN [aon_node1] WITH PASSWORD = 'XXX';
CREATE USER [aon_node1] FOR LOGIN [aon_node1];


-- Certificate creation
CREATE CERTIFICATE [cert_node1]
AUTHORIZATION aon_node1 FROM FILE = 'C:\TESTING\CERT_node1.CER'

GRANT CONNECT ON ENDPOINT::[HADR] TO [aon_node1];

Create an Always On availability group

Run the following command on Node1 to create an Always On availability group. The file system directory structure of the database files that will be created on Node2 should be identical to Node1 in order for automatic seeding to work. If they’re different, you must manually prepare the secondary database. For more information, see Prepare a secondary database for an Always On availability group.

Note that all SQL Server database files are stored on the D: drive by default, in the D:\rdsdbdata\DATA directory.

If you create or alter the database file location to be anywhere other than the D: drive, then RDS Custom places the DB instance outside the support perimeter. For details, refer to Troubleshooting DB issues for Amazon RDS Custom.

Modify the availability group name, database name, replica value, and endpoint_url value according to your environment.

USE [master]
GO
CREATE AVAILABILITY GROUP [<ag_name>]
WITH (AUTOMATED_BACKUP_PREFERENCE = PRIMARY,
DB_FAILOVER = OFF,
DTC_SUPPORT = NONE,
CLUSTER_TYPE = NONE,
REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 0)
FOR DATABASE [<db_name>]
REPLICA ON N'EC2AMAZ-C1B4GVO' WITH (
        ENDPOINT_URL = N'TCP://EC2AMAZ-C1B4GVO.OnPremSource.com:5022',
        FAILOVER_MODE = MANUAL, 
        AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, 
        BACKUP_PRIORITY = 50, 
        SEEDING_MODE = AUTOMATIC, 
        SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)),
    N'RDSAMAZ-KLIN0HM' WITH (
        ENDPOINT_URL = N'TCP://RDSAMAZ-KLIN0HM.AWSDestination.com:5022',
        FAILOVER_MODE = MANUAL,
        AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
        BACKUP_PRIORITY = 50, 
        SEEDING_MODE = AUTOMATIC, 
        SECONDARY_ROLE(ALLOW_CONNECTIONS = NO));
GO
ALTER AVAILABILITY GROUP [<ag_name>] GRANT CREATE ANY DATABASE;

On Node2, run the following code :

ALTER AVAILABILITY GROUP [<ag_name>] JOIN WITH (CLUSTER_TYPE = NONE);
ALTER AVAILABILITY GROUP [<ag_name>] GRANT CREATE ANY DATABASE;

An Always On availability group from on-premises to Amazon RDS Custom for SQL Server is successfully established.

The following screenshot shows Amazon EC2/on-premises replica as primary.

Migrate the on-premises database to Amazon RDS Custom for SQL Server

Up to this step, we have demonstrated the setup to replicate the database as is from on-premises to Amazon RDS Custom for SQL Server.

In a typical migration scenario, you set this up and let it run until your actual cutover day approaches. When you’re ready for the migration, complete the following steps:

  1. Monitor the asynchronous data flow between Node 1 and Node2 using dynamic management views (DMVs). Query the DMV sys.dm_hadr_database_replica_states for last_commit_lsn and last_commit_time on both nodes to get an estimation of how far the destination is behind. For more information about monitoring performance, refer to Monitor performance for Always On availability groups.
  2. Stop all incoming application traffic to Node 1 so no write activities are occurring.
  3. Until now, the availability mode for the database was set to asynchronous commit to gain better performance. Switch that to synchronous commit to help avoid data loss by running the following TSQL on Node1:
    USE [master]
    GO
    ALTER AVAILABILITY GROUP [<group-name>]
    MODIFY REPLICA ON N'<node1>' WITH (AVAILABILITY_MODE = SYNCHRONOUS_COMMIT)
    GO
    USE [master]
    GO
    ALTER AVAILABILITY GROUP [<group-name>]
    MODIFY REPLICA ON N'<node2>' WITH (AVAILABILITY_MODE = SYNCHRONOUS_COMMIT)
    GO
  4. Repeat step 1 to revalidate and make sure RTO and RPO is met. At this point, we’re ready to perform the actual migration.
  5. Run the following TSQL on Node2 to migrate the database from on-premises to Amazon RDS Custom for SQL Server. This allows new read/write application connections to connect to the RDS Custom for SQL Server instance. Because we’re using domain-independent availability groups, force_failover_allow_data_loss is the only way to migrate the database.
    USE master;
    GO
    
    ALTER AVAILABILITY GROUP  [<group_name>] FORCE_FAILOVER_ALLOW_DATA_LOSS
    GO
  6. Redirect your application, dependent services, traffic by modifying your connection string to use the RDS Custom for SQL Server instance so that the write activities can resume from this instance.

The following screenshot shows RDS Custom replica as primary.

 Rollback plan

For any production change, it’s crucial to have a rollback plan. After the database migration from on-premises to Amazon RDS Custom for SQL Server is complete, data movement and synchronization will be suspended by default between the nodes. As a best practice, you should reestablish the synchronization back to on-premises by running the following TSQL on the on-premises node until you reach the cleanup phase so that you have an option to rollback if needed:

ALTER DATABASE [<db-name>] SET HADR RESUME;
GO

Remove the on-premises dependencies

You can monitor your application and database performance for a period of time to ensure stable performance. When you’re ready to run independently on Amazon RDS Custom for SQL Server, remove the on-premises dependencies by running the following TSQL statements

This removes the Always On database from the availability group. On Node2, run the following code:

USE [master]
GO
ALTER AVAILABILITY GROUP [<group-name>] REMOVE DATABASE [<db-name>];

On both nodes, run the following code to drop the availability group completely:

DROP AVAILABILITY GROUP [<group-name>]

Decommission the on-premises servers

Now that you have removed the on-premises dependencies of your RDS Custom for SQL Server instance, you can decommission the on-premises database, instance, or server accordingly. If you want high availability with RDS Custom for SQL Server, refer to Configure high availability with Always On Availability Groups on Amazon RDS Custom for SQL Server.

Cleanup

To avoid incurring future charges, delete the resources you created as part of this post. You can clean up the following AWS resources:

Summary

SQL Server Always On availability groups provide a cost-effective high availability and disaster recovery feature. The domain-independent Always On availability group feature enables you to configure your SQL Server cluster without using Active Directory, which reduces the complexity during a hybrid cluster setup.

In this post, we demonstrated a solution for migrating your on-premises SQL Server workload to Amazon RDS Custom for SQL Server using a domain-independent Always On availability group, which makes the whole migration strategy more simple, secure and cost-effective.

Try out the solution and if you have any comments or questions, leave them in the comments section.


About the authors

Aravind Hariharaputran is Database Consultant with the Professional Services team at Amazon Web Services. He is passionate about databases in general with Microsoft SQL Server as his specialty. He helps build technical solutions that assist customers to migrate and optimize their on-premises database workload to the AWS Cloud. He enjoys spending time with family and playing cricket.

Alok Srivastava is Senior Database Consultant with Professional Services team at Amazon Web Services. He works as database specialist to help customers to architect and migrate their database solutions to AWS.