AWS Database Blog

Monitor your Microsoft SQL Server using custom metrics with Amazon CloudWatch and AWS Systems Manager

Monitoring Microsoft SQL Server-based applications is an important step to ensuring application performance and health. To gain deeper insights into Microsoft SQL Server behavior, you need to collect custom metrics to monitor and alert for abnormalities. Custom metrics for Microsoft SQL Server hosted on Amazon EC2 Windows instances are not readily available out of the box for monitoring and alerting. Amazon CloudWatch is an AWS service that provides monitoring using custom metrics, events, and logs you define for SQL Server workloads running on AWS in real time. Custom metrics can be collected and published to Amazon CloudWatch for monitoring and alerting purposes on either Amazon EC2 Windows instances or on-premises servers where CloudWatch agent and SSM Agent and are running.

Other AWS services

AWS Systems Manager gives you visibility into and control of your infrastructure on AWS.

AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, and license codes as parameter values. Parameter Store is offered at no additional charge.

In this blog post, we show you how to configure the CloudWatch agent on Amazon EC2 Windows instances to capture custom metrics for SQL Server from Windows performance monitor. We also show you how to publish those custom metrics and monitor them on Amazon CloudWatch console. We also walk you through on how to store custom configuration in AWS Systems Manager Parameter Store used by CloudWatch agent to capture those metrics and reuse the same configuration across multiple fleets of SQL Server instances where similar kind of metrics are needed.

CloudWatch agent supports the following 64-bit Windows OS versions:

  • Windows Server 2016, Windows Server 2012, Windows Server 2008, etc.

We’ll walk you through the following high-level steps for configuring the Amazon EC2 Windows instance for Amazon CloudWatch using AWS Systems Manager:

  1. Configuring IAM roles for the Amazon EC2 Windows instance
  2. Setting up CloudWatch agent on the Amazon EC2 Windows instance
  3. Creating the CloudWatch agent configuration file
  4. Storing CloudWatch agent configuration in AWS Systems Manager Parameter Store
  5. Applying SQL Server custom metrics configuration on the instances
  6. Monitoring SQL Server metrics in Amazon CloudWatch

Step 1: Configuring IAM roles for the Amazon EC2 Windows instance

Depending on how the CloudWatch agent configuration can be used, two different AWS Identity and Access Management (IAM) roles with policies can be configured on the Amazon EC2 Windows instance. One role with policy CloudWatchAgentServerPolicy enables the CloudWatch agent to be installed on a server and send metrics to Amazon CloudWatch. The other role with policy CloudWatchAgentAdminPolicy is needed to store your CloudWatch agent configuration in AWS Systems Manager Parameter Store, which enables multiple servers to reuse same CloudWatch agent configuration. For instances which just need read only access to reuse the configuration stored in Parameter Store, we recommend configuring them with CloudWatchAgentServerPolicy. See Create IAM Roles and Users for Use with CloudWatch Agent.

To use AWS Systems Manager to install or configure the CloudWatch agent, IAM policy AmazonEC2RoleforSSM needs to be attached to the role apart from the previously mentioned policies. This role enables the instance to communicate with Systems Manager. For more information about how to attach the role to an existing instance, see Attaching an IAM Role to an Instance.

Step 2: CloudWatch agent setup on the Amazon EC2 Windows instance

You can store and view the metrics you collect with the CloudWatch agent in Amazon CloudWatch just as you can with any other CloudWatch metrics.

On a server running Windows Server, installing the CloudWatch agent enables you to collect the system level metrics and SQL Server metrics associated with the counters in Windows Performance Monitor.

Custom metrics that are not available using Windows Performance Monitor can be retrieved using the StatsD protocol. The CloudWatch agent acts as a daemon for the protocol. You can use any standard StatsD client to send the metrics to the CloudWatch agent on port 8125. For more information, see Retrieve Custom Metrics with StatsD. For more information about some of the available StatsD clients, see the StatsD client page on GitHub.

Custom metrics collected by the CloudWatch agent from Windows Performance Monitor can be overridden with custom metric names, units, and dimensions. CloudWatch agent also collects aggregated metrics across different resources provided that the resource exists as part of the Windows Performance Object.

Metrics collected by the CloudWatch agent are billed as custom metrics. For more information about CloudWatch metrics pricing, see Amazon CloudWatch Pricing.

You can use AWS Systems Manager to install the CloudWatch agent on a running Amazon EC2 Windows instance. For installing the CloudWatch agent using AWS Systems Manager see Getting Started: Installing the CloudWatch Agent on Your First Instance.

Note: The CloudWatch agent requires that the instance is running SSM Agent version 2.2.93.0 or later. The easiest way to do that is to use Run Command to update to the latest version. For detailed instructions, see Update SSM Agent by Using Run Command.

Step 3: Creating the CloudWatch agent configuration file

The CloudWatch agent configuration file is a JSON file that specifies the metrics and logs that the agent is to collect.

I’ll demonstrate an Amazon EC2 Windows instance on which the role with CloudWatchAgentAdminPolicy has been applied. I want to show you how the CloudWatch agent configuration is created and also how it’s persisted in the AWS Systems Manager Parameter Store. First, connect to the Amazon EC2 Windows instance on which both SSM Agent and the CloudWatch agent are installed and proper IAM roles are applied as mentioned below in the note section are configured.

Note: If the instance just needs to install and run the CloudWatch agent to publish metrics to Amazon CloudWatch then the role with policy CloudWatchAgentServerPolicy would suffice.

Navigate to C:\Program Files\Amazon\AmazonCloudWatchAgent and create a file something like config.json. There are no naming restrictions.

The following is an example configuration with some of the critical and frequently used metrics to troubleshoot performance related issues. This will be helpful for DBAs or system admins. The example config captures SQL Server metrics, system metrics, and the SQL Server error log all in one place. It publishes with a periodic frequency of 5 seconds to Amazon CloudWatch. When you have system metrics for ex: CPU Utilization and SQL Server metrics for ex: Batch Requests/sec in Amazon CloudWatch you can use them to cross monitor and overlay metrics on each other to pinpoint issues like an increase in the CPU Utilization threshold that causes less Batch Requests/sec processed. For applicable custom metrics relevant to your SQL Server workloads, refer to Microsoft documentation for all metrics available and exposed via Windows Performance Monitor.

See the CloudWatch Agent Configuration File Schema Definition for units, configuring dimensions, configuring polling intervals, and renaming counters.

{
   "agent":{
      "metrics_collection_interval":5,
      "logfile":"c:\\ProgramData\\Amazon\\AmazonCloudWatchAgent\\Logs\\amazon-cloudwatch-agent.log"
   },
   "metrics":{
      "namespace": "sqlserver",
      "metrics_collected":{
         "LogicalDisk":{
            "measurement":[
              {
                 "name":"% Free Space",
                 "rename":"FreeStorageSpaceInPercent",
                 "unit":"Percent"
              },
              {
                 "name":"Free Megabytes",
                 "rename":"FreeStorageSpaceInMB",
                 "unit":"Megabytes"
              }
            ],
            "resources":[
               "C:",
               "D:",
               "_Total"
            ]
         },
         "Processor":{
            "measurement":[
               {
                  "name":"% Processor Time",
                  "rename":"CPUUtilization",
                  "unit":"Percent"
               }
            ],
            "resources":[
               "_Total"
            ]
         },
         "Memory":{
            "metrics_collection_interval":5,
            "measurement":[
              {
                 "name":"Available MBytes",
                 "rename":"FreeableMemory",
                 "unit":"Bytes"
              },
              {
                 "name":"Pages/Sec",
                 "rename":"PagesRetreivedPerSecFromDisk",
                 "unit":"Count/Second"
              }
            ]
         },
         "SQLServer:SQL Statistics":{
            "measurement":[
              {
                 "name":"SQL Compilations/sec",
                 "rename":"SQLCompilationsPerSec",
                 "unit":"Count/Second"
              },
              {
                 "name":"SQL Re-Compilations/sec",
                 "rename":"SQLReCompilationsPerSec",
                 "unit":"Count/Second"
              },
              {
                 "name":"Batch Requests/sec",
                 "rename":"BatchRequestsPerSec",
                 "unit":"Count/Second"
              }
            ]
         },
         "SQLServer:Access Methods":{
            "measurement":[
              "Page Splits/sec",
              "Forwarded Records/sec",
              "Full scans/sec"
            ]
         },
         "SQLServer:General Statistics":{
            "measurement":[
              "Processes blocked",
              {
                 "name":"User Connections",
                 "rename":"DatabaseConnections",
                 "unit":"Count"
              }
            ]
         },
         "SQLServer:Buffer Manager":{
            "measurement":[
              "Page life expectancy",
              "Page writes/sec",
              "Page reads/sec",
              "Buffer cache hit ratio",
              "Checkpoint pages/sec"
            ]
         },
         "SQLServer:Memory Manager":{
            "measurement":[
              "Memory Grants Pending"
            ]
         }
      }
   },
   "logs": {
     "logs_collected": {
       "files": {
         "collect_list": [
           {
             "file_path": "c:\\Program Files\\Microsoft SQL Server\\MSSQL13.MSSQLSERVER\\MSSQL\\Log\\ERRORLOG",
             "log_group_name": "sql-error.log",
             "timezone": "UTC",
             "log_stream_name":"sql_error_log_stream",
             "timestamp_format":"%H:%M:%S %y %b %-d"
           }
         ]
       },
       "windows_events": {
         "collect_list": [
           {
             "event_name": "System",
             "event_levels": [
               "INFORMATION",
               "ERROR"
             ],
             "log_group_name": "System",
             "log_stream_name": "System",
             "event_format": "xml"
           },
           {
             "event_name": "Application",
             "event_levels": [
               "WARNING",
               "ERROR"
             ],
             "log_group_name": "Application",
             "log_stream_name": "Application",
             "event_format": "xml"
           }
         ]
       }
     },
     "log_stream_name": "windows_log_stream"
   }
}

Step 4: Storing the CloudWatch agent configuration in AWS Systems Manager Parameter Store

You can set up SQL Server in clusters, deploy it in distributed Availability Zones, or bring up new nodes to add to existing fleet. It’s easy to apply similar configuration seamlessly across all the SQL Server nodes using config stored in AWS Systems Manager Parameter store.

We can use AWS CLI on Windows to store the configuration in the Parameter Store. If the AWS CLI is not installed on your Amazon EC2 Windows instance, see Install the AWS Command Line Interface on Microsoft Windows. You can Grant privileges to one of the instances in the fleet via which the config can be persisted in AWS Systems Manager Parameter Store. This config can later be used from Parameter Store to apply across the rest of the fleet.

Configure the AWS CLI to a specific Region by executing the AWS Configure command and specifying the Region where you want to store the configuration. You don’t need to enter the access key and secret key because  the Amazon EC2 Windows instance already has the CloudWatchAgentAdminRole applied to publish the configuration to AWS Systems Manager Parameter Store.

Note: The JSON file has a limitation of 4096 characters. I recommend trim out spaces to use a unprettified JSON. You can also enable the advanced tier of SSM Parameter Store as it supports parameters of 8192 characters (incurs additional costs).

aws ssm put-parameter –name “parameter name” –type “String” –value file://configuration_file_pathname

Note: The parameter name needs to start with the prefix “AmazonCloudWatch-”.

After running the command successfully, the configuration is now stored in AWS Systems Manager Parameter Store.

In the AWS Management Console, navigate to the AWS Systems Manager console and choose Parameter Store in the left navigation pane. You should be able to see the new parameter “AmazonCloudWatch-config” as in the below screen shot.

Step 5: Applying SQL Server custom metrics configuration on the instances

AWS Systems Manager can be used to start or restart the CloudWatch agent and apply the configuration stored in the AWS Systems Manager Parameter Store in the previous step. You can either choose Run Command or State Manager features of AWS Systems Manager to integrate with CloudWatch. In this blog post, we show you how to use State Manager to integrate with CloudWatch.

The AWS Systems Manager State Manager feature automates the process of keeping your managed instances in a defined state. There are two areas that make State Manager applicable here: (1) You know that the CloudWatch agent configuration will stay the same (or will be reset to the desired configuration) if someone manually changes it on the instance. (2) Using State Manager you can target associations (and thus configurations) with tags. You can or even target all managed instances in the account with tags, thus ensuring that as new instances are launched, the desired configuration is applied there too. You don’t have to remember to use AWS Systems Manager Run Command to set the configuration.

As shown in the following screenshot, the configuration “AmazonCloudWatch-config” stored in Parameter Store is applied across multiple instances by using the tag “SQLCustomMetricsConfigAppliedServer” in the AWS Systems Manager console. Then the configuration is rolled out across the fleet. The association “SQLServerCustomMetricsConfiguration” keeps the metrics configuration intact across the instances to identify any drift, and it resets back when changed.

To restart the CloudWatch agent using AWS Systems Manager Run Command, see Starting the CloudWatch Agent.

Step 6: Monitor SQL Server metrics in the Amazon CloudWatch console

You can store and view the metrics you collect with the CloudWatch agent in Amazon CloudWatch just as you can with any other Amazon CloudWatch metrics.

Navigate to the Amazon CloudWatch console and choose Metrics in the left navigation pane. Under the All Metrics tab select the namespace “sqlserver” mentioned in the Agent configuration file. If no namespace is listed in the configuration file, the default namespace is “CWAgent”. You should now see several metrics listed under various dimensions.

Note: It’s rarely possible to troubleshoot SQL Server performance issues using single metric.

Let’s look at the following example where a database admin receives complaints mentioning reports were running slow for about two hours but seem normal now. The admin immediately investigates and observes a spike in CPU utilization in the Amazon CloudWatch console over a period where apparently reports were running slow.

By overlaying other metrics like SQL Server Batch requests/sec on the CPU Utilization metric in the Amazon CloudWatch console over an adjustable period, the problem becomes clear immediately. The 100% spike in CPU Utilization on the server is caused by some process running on the server, which made SQL Server starve on CPU cycles and eventually led to slowing down the report queries it could process per second.

The DBA can use this data to troubleshoot in that time range on the SQL Server and look for any processes that were the real culprits that caused the spike in CPU Utilization.

For more available options to search for metrics, see Using Amazon CloudWatch Metrics.

Optionally you can also see the SQL Server log published to Amazon CloudWatch Logs and see the Log Group “sql-error.log”.

Conclusion

In this blog post you have learned how to install the CloudWatch agent on an Amazon EC2 Windows instance and publish custom SQL Server metrics, along with system metrics and SQL Server logs to Amazon CloudWatch and monitor them. By publishing the custom SQL Server metrics along with system metrics to Amazon CloudWatch, you can compare various metrics over a chosen timeframe and overlay them. This makes diagnostic analysis much simpler for DBAs or system admins. In addition, when you use AWS Systems Manager to roll out a similar CloudWatch agent configuration across a fleet of instances where SQL Server is running, you can ensure consistent monitoring. This allows DBAs and system admins to troubleshoot more easily.

Various custom dashboards with multiple metrics, each of which provides a distinct view, can be built in Amazon CloudWatch on top of the metrics that are published for improved monitoring visibility.

See Sending Logs, Events, and Performance Counters to Amazon CloudWatch for additional information on how to send logs and other counters apart from the custom metrics as mentioned in the blog post.


About the Author

Nithin Reddy Cheruku is a cloud infrastructure architect with AWS Professional Services, specializing in DevOps and artificial intelligence. He likes to solve real world day to day problems using technology. Outside of work, Nithin likes to play cricket and ping pong.