AWS Startups Blog
Monitoring an App: Examples from the AWS Startup Kit
Monitoring involves processing app logs and metrics to provide insights into your app’s performance and health so that you can keep your app running smoothly. It’s a best practice and key aspect of operational excellence, which focuses on how to optimally run systems. Operational excellence is one of the pillars of the AWS Well-Architected framework. The framework emphasizes the importance of monitoring for automating changes such as scaling, responding to events such as service disruptions, and implementing standards for managing daily operations.
To demonstrate monitoring with AWS services, I’ll use the AWS Startup Kit, which is a set of resources designed to accelerate a startup’s product development on AWS. This post is the fourth in a series about the Startup Kit. Before you read this post, I recommend that you read Building a VPC with the AWS Startup Kit and Launch your app with the AWS Startup Kit so that you’re familiar with the relevant parts of the Startup Kit. This post builds on the previous two posts and covers material that is more advanced.
A core component of the Startup Kit is well-architected sample workloads that you can launch within minutes. The workloads are supported by AWS CloudFormation templates and code published on GitHub. For this post, I’ll refer to the templates at https://github.com/awslabs/startup-kit-templates and the Node.js sample app at https://github.com/awslabs/startup-kit-nodejs.
The primary AWS service for monitoring is Amazon CloudWatch. You can use CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources. The following architecture diagram shows a typical use case where CloudWatch collects metrics from AWS resources and custom sources, provides access to metric statistics, and provides alarms with actions for sending email notifications and triggering an Auto Scaling group to scale out or scale in.
Out of the box, CloudWatch provides metrics for all of your AWS resources, and you can extend it using custom metrics to capture other key metrics for monitoring your app and infrastructure. There are two options for capturing custom metrics with CloudWatch: using the CloudWatch API, and using filters on CloudWatch Logs. In this post, I examine both options. I also discuss using CloudWatch to set alarms. The discussion includes examples that you can use as a starting point for your own apps.
Custom metrics with the CloudWatch API
You can collect many different metrics at the application level. Because an app’s responsiveness is important for its success, it’s a best practice to collect a variety of latency metrics. At a minimum, you should collect latency metrics for the app’s own API calls, calls to dependencies such as third-party services, and data store transactions. Counts of calls to each of the app’s API calls are useful metrics to measure activity in the app.
Metrics also can help detect unusual and problematic conditions. For example, a metric can track the number of failed app login attempts. Let’s now examine how the CloudWatch API can be used to create such a metric. Because I’m working with the Startup Kit’s Node.js sample app for this post, I’ll use the AWS SDK for JavaScript to call the CloudWatch API.
If you want to see the examples in this post in action, launch the relevant Startup Kit resources using the Startup Kit templates. Specifically, you’ll need to launch the VPC, RDS database, and the Elastic Beanstalk app templates, and then run the Node.js sample app in Elastic Beanstalk. You can view the complete code for all examples by referring to the templates and the GitHub repository of the Node.js sample app.
You can access the CloudWatch API through the AWS.CloudWatch object of the JavaScript SDK. The relevant method is putMetricData. The following code snippet from the sample app’s util/aws.js source file shows how to invoke this method for a count metric such as a count of failed app login attempts. Invoking the exported method with the argument LOGIN_FAILURE will publish a metric named LOGIN_FAILURE_COUNT to CloudWatch.
const cloudWatch = new AWS.CloudWatch();
exports.publishMetric = (metricName) => {
cloudWatch.putMetricData(metricDataHelper(metricName))
.on('error', (err, res) => log.error(`Error publishing metrics: ${err}`))
.send();
}
function metricDataHelper(metricName) {
let params = {
MetricData: [
{
MetricName: `${metricName}_COUNT`,
Timestamp: new Date,
Unit: 'Count',
Value: 1.0
}
],
Namespace: 'STARTUP_KIT/API'
}
return params;
}
After you launch the sample app, you can view a graph of the LOGIN_FAILURE_COUNT metric published by the app. Follow these steps:
- Get an app account: Create a new account in the sample app, or use a previously created one.
- Generate sample data points: Try to sign in with an incorrect password multiple times over the space of a few minutes.
- Graph the metric: On the CloudWatch console, choose Metrics in the navigation pane. Under the All metrics tab, choose Custom Namespaces, and then choose STARTUP_KIT/API. Choose Metrics with no dimensions, choose the LOGIN_FAILURE_COUNT metric name, and then choose Graph this metric only. Switch to the Graphed metrics tab, and set Statistic to Sum and Period to 1 Minute using the dropdown arrows.
When you’re finished, you should have a graph similar to the graph in the following screenshot.
In the graph, the leftmost data point represents 13 failed login attempts aggregated over the prior 1 minute period. The next data point occurs three minutes later and represents a further 8 failed login attempts over the 1 minute period ending 21:36.
We examined only one graph, but there are options to view multiple graphs at one time. For example, you can combine your most frequently viewed graphs in a CloudWatch dashboard, which is a customizable home page in the CloudWatch console that you can use to monitor your resources in a single view.
Instead of publishing individual data points for a metric as shown in the preceding screenshot, it’s also possible to do batch publishing. This helps minimize network activity by reducing the number of API calls, and can be used to comply with any applicable API limits. In CloudWatch, batch publishing is known as a statistic set, and it can be used to avoid the CloudWatch API limit of 20 individual metric data points per putMetricData request. When you have multiple data points per minute, aggregating data via statistic sets also minimizes the number of calls to putMetricData. For example, if you are creating a metric that keeps counts of calls to an API that will receive a large amount of traffic, it would be preferable to use a statistic set.
Creating metrics with CloudWatch Logs filters
In addition to using an API such as the CloudWatch API to publish metrics, it also is common to extract metrics from log files. CloudWatch Logs simplifies the process of monitoring, storing, and accessing logs from Amazon EC2 instances and other sources. In general, to send logs from EC2 instances to CloudWatch Logs, you need to first install the CloudWatch Logs agent on the instances.
However, in this post I show how to get metrics for the Startup Kit Node.js sample app, which runs on EC2 instances managed by AWS Elastic Beanstalk. There is an easy way to get logs from Elastic Beanstalk to CloudWatch Logs: you simply enable CloudWatch Logs streaming. The sample app enables streaming logs via the Startup Kit’s Elastic Beanstalk CloudFormation template.
To demonstrate how to extract metrics from CloudWatch Logs, I’ll create a metric that measures the latency of login API calls. This is a critical measurement because users tend to be discouraged from using apps that suffer from slow login, especially if login is required to access the key features of the app. A metric based on a CloudWatch Logs group is extracted using a metrics filter. The following are the steps for creating and using a metrics filter for a LOGIN_LATENCY metric. For metric filter creation, use either step 1 (CloudFormation template) or 2 (CLI command), but not both.
- CloudFormation: For this purposes of this post, you can create the metrics filter by creating a CloudFormation stack based on the devops.cfn.yml template in the templates directory of the Startup Kit templates repository.
- CLI command: Alternatively, you can create a metrics filter using the CloudWatch console or using the following AWS CLI command (replacing “<your-Elastic-Beanstalk-environment>” with your Elastic Beanstalk environment name). Let’s examine the command line-by-line. In the first line, put-metric-filter is the actual command. The second line specifies the name of the CloudWatch Logs group to which the filter is applied, and corresponds in this case to a log named nodejs.log that is automatically streamed to CloudWatch Logs by Elastic Beanstalk. The third line simply specifies a name for the filter itself, while the fourth line specifies a filter pattern. A filter pattern can apply to JSON log events, terms in a log or, if the log entries are fields organized as space-delimited values, particular fields of each log entry. The last two lines of the command specify a name and namespace for the metric, and which filter pattern field supplies the metric’s value.
aws logs put-metric-filter \
--log-group-name /aws/elasticbeanstalk/<your-Elastic-Beanstalk-environment>/var/log/nodejs/nodejs.log \
--filter-name LOGIN_LATENCY \
--filter-pattern '[timestamp, level, message = POST_AUTH_latency, latency]' \
--metric-transformations \
'metricName=LOGIN_LATENCY,metricNamespace=STARTUP_KIT/API,metricValue=$latency'
- Generate data: To view a graph for this metric, first generate some data by logging in a few times.
- Graph the metric: Next, follow the directions in the preceding section for viewing graphs, but choose the LOGIN_LATENCY metric rather than the LOGIN_FAILURE_COUNT metric. Also, set Statistic to Average rather than Sum.
It might take a couple of minutes for your data to appear in the graph.
Alarms and notifications
After you start collecting metrics, you have several options for using them. As discussed earlier, you can use the CloudWatch console to graph metric data. You also can create CloudWatch alarms based on your metrics to provide notifications and respond to events.
A CloudWatch alarm watches a single metric. The alarm performs one or more actions based on the value of the metric relative to a threshold over a number of time periods. The action can be an Amazon EC2 action, an Auto Scaling action, or a notification sent to an Amazon SNS topic. To receive messages published to the SNS topic, you can subscribe one or more endpoints to that topic. For monitoring purposes, SNS endpoints could include email addresses, SMS-enabled devices, web servers (HTTP/HTTPS), AWS Lambda functions, or Amazon SQS queues. These endpoints can be used as part of a workflow that automates responses to the alarm.
Let’s continue from the earlier example of the LOGIN_FAILURE_COUNT metric and create a related alarm. The next steps describe how to create and use the alarm. For alarm creation, use either a CloudFormation template (step 1) or a CLI command (steps 2 and 3), but not both.
- CloudFormation: If you created a CloudFormation stack based on the devops.cfn.yml template in the templates directory of the Startup Kit templates repository, as described earlier in the CloudWatch Logs filter section, a CloudWatch alarm based on the login failure count metric was created for you. A related SNS topic also was created. You can skip steps 2 and 3.
- CLI command – part 1: Alternatively, you can create an alarm using the CloudWatch console or the AWS CLI. First, create a SNS topic for notifications created by the alarm. You can do this with the CLI command: aws sns create-topic –name STARTUP_KIT_DEVOPS. Save the value of the TopicArn field from the response returned by this call.
- CLI command – part 2: Next, create the alarm itself with the command below, replacing the value for alarm-options with the value of TopicArn returned by the previous CLI command. Most of the options are self-explanatory. In plain English, this command says to create an alarm tied to LOGIN_FAILURE_COUNT where, if there are more than 10 failed login attempts in a single 60-second time period, the alarm is triggered and sends a SNS notification to the specified SNS topic.
aws cloudwatch put-metric-alarm \
--alarm-name LOGIN_FAILURE_ALARM
--alarm-description 'Alarm on login attempt failures' \
--actions-enabled \
--alarm-actions <your-SNS-topic-ARN> \
--metric-name LOGIN_FAILURE_COUNT \
--namespace STARTUP_KIT/API \
--statistic Sum \
--period 60 \
--evaluation-periods 1 \
--threshold 10 \
--comparison-operator GreaterThanThreshold
- Subscribe to notifications: For this example, the alarm sends a notification to a SNS topic. This topic is created using the same template as the alarm and the metrics filter. For purposes of confirming everything is working together, it’s easiest to subscribe one of your email addresses to the topic. In the SNS console, choose the topic name (STARTUP_KIT_DEVOPS), choose Create subscription, choose Email for the Protocol field, and then enter your email address. Check your email for the confirmation message, and then click the link to confirm.
- Test the alarm and get a notification: Next, use a previously created app account and try to sign in more than 10 times with an incorrect password. Check your email for the alarm notification.
After you create the CloudWatch alarm, it always will be in one of three states. When you first create the alarm, its state likely will be INSUFFICIENT_DATA because there are no data points yet. As you make login attempts that fail, the alarm transitions to the OK state as long as the sum of failed calls is less than or equal to the threshold. After the threshold is exceeded, the alarm transitions to the ALARM state, and the alarm will trigger any enabled actions such as sending notifications.
Further Startup Kit reading
To check out the other resources provided by the AWS Startup Kit, refer to the following list of published resources:
- Introducing the Startup Kit Serverless Workload, supported by the GitHub repository https://github.com/awslabs/startup-kit-serverless-workload.
- Building a VPC with the AWS Startup Kit, supported by the GitHub repository https://github.com/awslabs/startup-kit-templates.
- Launch your app with the AWS Startup Kit, also supported by the GitHub repository https://github.com/awslabs/startup-kit-templates.
- Node.js sample app: https://github.com/awslabs/startup-kit-nodejs.