How do I troubleshoot Microsoft SQL issues on my EC2 Windows instance?

6 minute read

I'm receiving errors on my Amazon Elastic Compute Cloud (Amazon EC2) instance running Microsoft SQL Server. Or, SQL Server is slow causing application logging timeout errors.

Short description

The following are common errors that might occur on your SQL Server:

The SQL Server instance failed over from a primary replica to a secondary.
Event Viewer logs event 1196 "Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason" every 15 minutes.
You can't create a SQL listener.
SQL Server is slow and causing applications to log timeout errors.
Adding a node fails with an "invalid credentials" error after detecting settings for a separate server on SQL setup.

Resolution

The SQL Server instance failed over from a primary replica to a secondary

Fail over often occurs with the underlying cluster group. To determine if the issue is at the cluster level, view the System Logs in the Event Viewer using these steps:

Open the Run application, enter eventvwr, and then select OK.
Select Windows Logs, System. If you see events 1069, 1146, 1230, or 1135 in the logs, then begin investigating at the cluster level by generating cluster logs.
Note: If you see event 1135, then cluster node failover occurred. For information on resolving this issue, see Troubleshooting cluster issue with Event ID 1135 on the Microsoft website.

To generate and retrieve cluster logs, follow these steps:

Generate the latest set of cluster logs by running the get-clusterlogs command in Windows PowerShell.
Retrieve the cluster logs from the default location at C:\windows\cluster\reports.

If the cluster logs show that the cluster failed over due to SQL resources, review the Event Viewer application logs for SQL errors.

For more information, see Mechanics and guidelines of lease, cluster, and health check timeouts for Always On availability groups on the Microsoft website.

For specific details on SQL Server service failures, review the SQL Server error logs. For more information, see View the SQL Server error log in SQL Server Management Studio (SSMS) on the Microsoft website.

Event Viewer logs event 1196 "Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason" every 15 minutes

By default, the Cluster Name Object (CNO) goes to DNS every 15 minutes to check if needed records still exist. If the check fails, then you receive the "Cluster network name resource 'Cluster Name' failed" registration error. This error often occurs when the CNO is missing permissions on the cluster DNS record for the network name in the cluster.

To resolve this, grant full control to the CNO on the DNS record. By default, the cluster usually grants these permissions at the time of creation.

If you're using a static DNS record, or if you don't want the cluster to keep updating your DNS records, then follow these steps:

Navigate to ncpa.cpl, NIC card properties, Ipv4 properties, Advanced, DNS.
Uncheck Register this connection's addresses in DNS.

You can't create a SQL listener

If you can't create a SQL listener, additional troubleshooting might be required. For more information, see Create listener fails with message "The WSFC cluster could not bring the Network Name resource online" on the Microsoft website.

SQL Server is slow and causing applications to log timeout errors

Verify if your SQL Server is experiencing high memory, high CPU, or a disk bottleneck:
Check for high memory consumption on your OS. Open Task Manager and then select the Performance tab. If you see high memory consumption, cap the SQL memory, if it isn't already capped. For more information, see Server memory configuration options on the Microsoft website.

If SQL memory is already capped, access the Windows Performance Monitor (perfmon) to determine what application is using memory. You can set perfmon locally or remotely. For more information, see Windows Performance Monitor Overview on the Microsoft website.

Check for high CPU on the OS. Open Task Manager and then select the Performance tab. Or, review Amazon CloudWatch metrics. Access the Windows Performance Monitor to determine what application is using memory. High CPU is commonly caused by an outdated driver. Upgrade to the latest PV driver version. Or, upgrade the AWS NVMe drivers to the latest version.

Check for high disk consumption on the server, volume slowness, or high I/O. As with high CPU, outdated drivers might cause disk consumption. Upgrade to the latest PV driver version. Or, upgrade the AWS NVMe drivers to the latest version. If the issue persists after updating the drivers, configure perfmon to determine what is consuming disk I/O.

For troubleshooting SQL Server or slow database applications, see Troubleshoot entire SQL Server or database application that appears to be slow on the Microsoft website.
For troubleshooting slow SQL Server I/O issues, see Troubleshoot slow SQL Server performance caused by I/O issues on the Microsoft website.

Adding a node fails with an "invalid credentials" error after detecting settings for a separate server on SQL setup

Verify that one of the following is true:

The SQL agent account name is blank in SQL Setup on the secondary node after detecting the second server detail.
The registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL12.SQL2014\ConfigurationState has one or more keys with the value 2.

If either of the preceding is true, then follow these steps:

Open Windows PowerShell, and then run the following commands:

$ClusterName = "YourClusterName"
$FciClusterGroupName = "SQL Server (MSSQLSERVER)"
Add-ClusterResourceType -Name "SQL Server Agent" -Dll "sqagtres.dll"

Note: Modify the variable names, such as YourClusterName, in the preceding command examples to match your environment.

Open the Failover Cluster Manager, and then select Roles, SQL role, Add a resource, More resources.
Select New SQL Server Agent, Properties, Dependencies.
In the Resource field, select SQL Server.
Select the Properties tab, and enter the name of your SQL Server instance in the InstanceName field. Then, enter the name of the SQL cluster in the VirtualServerName field. The new SQL Server agent starts.
Make sure that you change all keys to value 1 in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL12.SQL2014\ConfigurationState.
Run setup again. You can now add nodes without receiving credentials errors.

Related information

How do I launch Microsoft SQL Server on an EC2 Windows instance?

How do I upgrade my Microsoft SQL Cluster on my EC2 Windows instances?

Topics

Compute

Relevant content

Upgrade SQL Server 2017 to SQL Server 2019 on EC2
rePost-User-4900275
asked a year ago
Does Microsoft SQL Server 2019 Standard on Windows Server 2019 include SSRS?
rePost-User-2073324
asked 2 years ago
Sizing RDS SQL Server for existing SQL Server running on Windows Server
Accepted Answer
rePost-User-Shailesh
asked 2 years ago
Connect To SQL Server On AWS Windows Instance
VipulAgheda
asked a year ago
SQL Server on RDS Issue With Linked Server
Accepted Answer
EXPERT
Tedy_T
asked 4 years ago
How do I upgrade my Microsoft SQL cluster on my EC2 Windows instances?
AWS OFFICIALUpdated a year ago
How do I troubleshoot RDS for SQL Server Windows Authentication issues with AWS Managed Microsoft AD?
AWS OFFICIALUpdated a year ago
What are common issues that might occur when using native backup and restore in RDS for SQL Server?
AWS OFFICIALUpdated 2 years ago
How do I launch Microsoft SQL Server on an EC2 Windows instance?
AWS OFFICIALUpdated 7 months ago
Migrating Microsoft SQL Server workloads to the AWS Cloud
EXPERT
Gayathri Krishnamoorthy
published 8 months ago