Category: Amazon CloudWatch


Amazon CloudWatch launches Alarms on Dashboards

by Tara Walker | on | in Amazon CloudWatch, Launch | | Comments

Amazon CloudWatch is a service that gives customers the ability to monitor their applications, systems, and solutions running on Amazon Web Services by providing and collecting metrics, logs, and events about AWS resources in real time. CloudWatch automatically provides key resource measurements such as; latency, error rates, and CPU usage, while also enabling monitoring of custom metrics via customer-supplied logs and system data.

Last November, Amazon CloudWatch added new Dashboard Widgets to provide additional data visualization options for all available metrics. In order to provide customers with even more insight into their solutions and resources running on AWS, CloudWatch has launched Alarms on Dashboards. With this alarms enhancement, customers can view alarms and metrics in the same dashboard widget enabling them to perform data-driven troubleshooting and analysis.

CloudWatch dashboards are designed with a goal of providing better visibility when monitoring AWS resources across regions in a consolidated view. Since CloudWatch dashboards are highly customizable, users can create their own custom dashboards to graphically represent data for varying metrics such as utilization, performance, estimated billing, and now alarm conditions. An alarm tracks a single metric over time based on the value of the metric in relation to a specified threshold. When the alarm state changes, an action such an Auto Scaling policy is executed or a notification is sent to Amazon SNS, among other options.

With the ability to add alarms to dashboards, CloudWatch users have another mechanism to proactively monitor and receive alerts about their AWS resources and applications across multiple regions. In addition, the metric data associated with an alarm, which has been added to a dashboard, can be charted and reviewed. Alarms have three possible states:

  • OK: The value of the alarm metric does not meet the threshold
  • INSUFFICIENT DATA: Initial triggering of alarm metric or alarm metric data does not have enough data to determine whether it’s in the OK state or the ALARM state
  • ALARM: The value of the alarm metric meets the threshold

When added to a dashboard, alarms are displayed in red when in the Alarm state, gray when in the Insufficient data state and shown with no color fill when the alarm is in the OK state. Alarms added to a dashboard are supported with the following widgets: Line, Number, and Stacked Graph widgets.

  • Number widget: provides a quick and efficient view of the latest value of any desired metric. Using the widget with alarms, the view of the state of the alarm is shown with different background colors for the latest metric data.
  • Line widget: allows the visualization of the actual value of any collection of chosen metrics. Provides a view on the dashboard of the state of the alarm, which displays the alarm threshold and condition as a horizontal line. The threshold line can act as a good indicator to view the degree of the alarm.
  • Stack graph widget: allows customers to visualize the net total effect of any collection of chosen metrics. The stacked graph widget loads one metric over another in order to illustrate the distribution and contribution of a metric and has the option to display the contribution of metrics in percentages. With alarms, it also provides a view of the state of the alarm, which displays the alarm threshold and condition as a horizontal line.

Currently, adding multiple metrics onto the same widget for an alarm is in the works and this feature is evolving based on customer feedback.

Adding Alarms on Dashboards

Let’s take a quick look at the utilizing the Alarms on a CloudWatch Dashboard. In the AWS Console, I will go to the CloudWatch service. When in the CloudWatch console, select Dashboards. I will click the Create dashboard button and create the CloudWatchBlog dashboard.

 

Upon creation of my CloudWatchBlog dashboard, a dialog box will open to allow me to add widgets to the dashboard. I will forego adding widgets for now since I want to focus on adding alarms on my dashboard. Therefore, I will hit the Cancel button here and go to the Alarms section of the CloudWatch console.

Once in the Alarms section of the CloudWatch console, you will see all of your alarms and the state of each of the alarms for the current region displayed.

As we mentioned earlier, there are three types of alarm states and as you can see in my console above that all of the different alarms states for various alarms are being displayed. If desired, you can adjust your filter on the console to display alarms filtered by the alarm state type.

As an example, I am only interested in viewing the alarms with an alarm state of ALARM. Therefore, I will adjust the filter to show only the alarms in the current region with an alarm state as ALARM.

Now only the two alarms that have a current alarm state of ALARM are displayed. One of these alarms is for monitoring the provisioned write capacity units of an Amazon DynamoDB table, and the other is to monitor the CPU utilization of my active Amazon Elasticsearch instance.

Let’s examine the scenario in which I leverage my CloudWatchBlog dashboard as my troubleshooting mechanism for identifying and diagnosing issues with my Elasticsearch solution and its instances. I will first add the Amazon Elasticsearch CPU utilization alarm, ES Alarm, to my CloudWatchBlog dashboard. To add the alarm, I simply select the checkbox by the desired alarm, which in this case is ES Alarm. Then with the alarm selected, I click the Add to Dashboard button.

The Add to dashboard dialog box will open, allowing me to select my CloudWatchBlog dashboard. Additionally, I can select the widget type I would like to use for the display of my alarm. For the ES Alarm, I will choose the Line widget and complete the process of adding this alarm to my dashboard by clicking the Add to dashboard button.

Upon successfully adding ES Alarm to the CloudWatchBlog dashboard, you will see a confirmation notice displayed in the CloudWatch console.

If I then go to the Dashboard section of the console and select my CloudWatchBlog dashboard, I will see the line widget for my alarm, ES Alarm, on the dashboard. To ensure that my ES Alarm widget is a permanent part of the dashboard, I will click the Save dashboard button to preserve the addition of this widget on the dashboard.

As we discussed, one of the benefits of utilizing a CloudWatch dashboard is the ability to add several alarms from various regions onto a dashboard. Since my scenario is leveraging my dashboard as a troubleshooting mechanism for my Elasticsearch solution, I would like to have several alarms and metrics related to my solution displayed on the CloudWatchBlog dashboard. Given this, I will create another alarm for my Elasticsearch instance and add it to my dashboard.

I will first return to the Alarms section of the console and click the Create Alarm button.

The Create Alarm dialog box is displayed showing all of the current metrics available in this region. From the summary, I can quickly see that there are 21 metrics being tracked for Elasticsearch. I will click on the ES Metrics link to view the individual metrics that can be used to create my alarm.

I can review the individual metrics shown for my Elasticsearch instance, and choose which metric I want to base my new alarm on. In this case, I choose the WriteLatency metric by selecting the checkbox for this metric and then click the Next button.

 

The next screen is where I fill in all the details about my alarm: name, description, alarm threshold, time period, and alarm action. I will name my new alarm, ES Latency Alarm, and complete the rest of the aforementioned data fields. To complete the creation of my new alarm, I click the Create Alarm button.

I will see a confirmation message box at the top of the Alarms console upon successful completion of adding the alarm, and the status of the newly created alarm will be displayed in the alarms list.

Now I will add my ES Latency Alarm to my CloudWatchBlog dashboard. Again, I click on the checkbox by the alarm and then click the Add to Dashboard button.

This time when the Add to Dashboard dialog comes up, I will choose the Stacked area widget to display the ES Latency Alarm on my CloudWatchBlog dashboard. Clicking the Add to Dashboard button will complete the addition of my ES Latency Alarm widget to the dashboard.

Once back in the console, again I will see the confirmation noting the successful addition of the widget. I go to the Dashboards and click on the CloudWatchBlog dashboard and I can now view the two widgets in my dashboard. To include this widget in the dashboard permanently, I click the Save dashboard button.

The final thing to note about the new CloudWatch feature, Alarms on Dashboards, is that alarms and metrics from other regions can be added to the dashboard for a complete view for troubleshooting. Let’s add a metric to the dashboard with the alarms widget.

Within the console, I will move from my current region, US East (Ohio), to the US East (N. Virginia) region.

Now I will go to the Metric section of the CloudWatch console. This section displays the metrics from services used in the US East (N. Virginia) region.

My Elasticsearch solution triggers Lambda functions to capture all of the EmployeeInfo DynamoDB database CRUD (Create, Read, Update, Delete) changes via DynamoDB streams and write those changes into my Elasticsearch domain, taratestdomain. Therefore, I will add metrics to my CloudWatchBlog dashboard to track table metrics from DynamoDB.

Therefore, I am going to add the EmployeeInfo database ProvisionedWriteCapacityUnits metric to my CloudWatchBlog dashboard.

Back again in the Add to Dashboard dialog, I will select my CloudWatchBlog dashboard and choose to display this metric using the Number widget.

Now, the ProvisionedWriteCapacityUnits metric from the US East (N. Virginia) is displayed in the CloudWatchBlog dashboard with the Number widget added to the dashboard to with the alarms from the US East (Ohio). To make this update permanent in the dashboard, I will (you guessed it!) click the Save dashboard button.

Summary

Getting started with alarms on dashboards is easy. You can use alarms on dashboards across regions for another means of proactively monitoring alarms, build troubleshooting playbooks, and view desired metrics. You can also choose the metric first in the Metric UI and then change the type of widget according to the visualization that fits the metric.

Alarms on Dashboards are supported on Line, Stacked Area, and Number widgets. In addition, you can use Text widgets next to alarms on a dashboard to add steps or observations on how to handle changes in the alarm state. To learn more about Amazon CloudWatch widgets and about the additional dashboard widgets, visit the Amazon CloudWatch documentation and the CloudWatch Getting Started guide.

 

Tara

EC2 Run Command is Now a CloudWatch Events Target

by Jeff Barr | on | in Amazon CloudWatch, EC2 Systems Manager | | Comments

Ok, time for another peanut butter and chocolate post! Let’s combine EC2 Run Command (New EC2 Run Command – Remote Instance Management at Scale) and CloudWatch Events (New CloudWatch Events – Track and Respond to Changes to Your AWS Resources) and see what we get.

EC2 Run Command is part of EC2 Systems Manager. It allows you to operate on collections of EC2 instances and on-premises servers reliably and at scale, in a controlled and selective fashion. You can run scripts, install software, collect metrics and log files, manage patches, and much more, on both Windows and Linux.

CloudWatch Events gives you the ability to track changes to AWS resources in near real-time. You get a stream of system events that you can easily route to one or more targets including AWS Lambda functions, Amazon Kinesis streams, Amazon SNS topics, and built-in EC2 and EBS targets.

Better Together
Today we are bringing these two services together. You can now create CloudWatch Events rules that use EC2 Run Command to perform actions on EC2 instances or on-premises servers. This opens the door to all sorts of interesting ideas; here are a few that I came up with:

Final Log Collection – Collect application or system logs from instances that are being shut down (either manually or as a result of a scale-in operation initiated by Auto Scaling).

Error Log Condition – Collect logs after an application crash or a security incident.

Instance Setup – After an instance has started, download & install applications, set parameters and configurations, and launch processes.

Configuration Updates – When a config file is changed in S3, install it on applicable instances (perhaps determined by tags). For example, you could install an updated Apache web server config file on a set of properly tagged instances, and then restart the server so that it picks up the changes. Or, update an instance-level firewall each time the AWS IP Address Ranges are updated.

EBS Snapshot Testing and Tracking – After a fresh snapshot has been created, mount it on a test instance, check the filesystem for errors, and then index the files in the snapshot.

Instance Coordination – Every time an instance is launched or terminated, inform the others so that they can update internal tracking information or rebalance their workloads.

I’m sure that you have some more interesting ideas; please feel free to share them in the comments.

Time for Action!
Let’s set this up. Suppose I want to run a specific Linux shell command every time Auto Scaling adds another instance to an Auto Scaling Group.

I start by opening the CloudWatch Events Console and clicking on Create rule:

I configure my Event Source to be my Auto Scaling Group (AS-Main-1), and indicate that I want to take action when EC2 instances are launched successfully:

Then I set up the target. I choose SSM Run Command, pick the AWS-RunShellScript document, and indicate that I want the command to be run on the instances that are tagged as coming from my Auto Scaling group:

Then I click on Configure details, give my rule a name and a description, and click on Create rule:

With everything set up, the command service httpd start will be run on each instance launched as a result of a scale out operation.

Available Now
This new feature is available now and you can start using it today.

Jeff;

 

CloudTrail Update – Capture and Process Amazon S3 Object-Level API Activity

by Jeff Barr | on | in Amazon CloudTrail, Amazon CloudWatch, Amazon S3, AWS Lambda | | Comments

I would like to show you how several different AWS services can be used together to address a challenge faced by many of our customers. Along the way I will introduce you to a new AWS CloudTrail feature that launches today and show you how you can use it in conjunction with CloudWatch Events.

The Challenge
Our customers store many different types of mission-critical data in Amazon Simple Storage Service (S3) and want to be able to track object-level activity on their data. While some of this activity is captured and stored in the S3 access logs, the level of detail is limited and log delivery can take several hours. Customers, particularly in financial services and other regulated industries, are asking for additional detail, delivered on a more timely basis. For example, they would like to be able to know when a particular IAM user accesses sensitive information stored in a specific part of an S3 bucket.

In order to meet the needs of these customers, we are now giving CloudTrail the power to capture object-level API activity on S3 objects, which we call Data events (the original CloudTrail events are now called Management events). Data events include “read” operations such as GET, HEAD, and Get Object ACL as well as “write” operations such as PUT and POST. The level of detail captured for these operations is intended to provide support for many types of security, auditing, governance, and compliance use cases. For example, it can be used to scan newly uploaded data for Personally Identifiable Information (PII), audit attempts to access data in a protected bucket, or to verify that the desired access policies are in effect.

Processing Object-Level API Activity
Putting this all together, we can easily set up a Lambda function that will take a custom action whenever an S3 operation takes place on any object within a selected bucket or a selected folder within a bucket.

Before starting on this post, I created a new CloudTrail trail called jbarr-s3-trail:

I want to use this trail to log object-level activity on one of my S3 buckets (jbarr-s3-trail-demo). In order to do this I need to add an event selector to the trail. The selector is specific to S3, and allows me to focus on logging the events that are of interest to me. Event selectors are a new CloudTrail feature and are being introduced as part of today’s launch, in case you were wondering.

I indicate that I want to log both read and write events, and specify the bucket of interest. I can limit the events to part of the bucket by specifying a prefix, and I can also specify multiple buckets. I can also control the logging of Management events:

CloudTrail supports up to 5 event selectors per trail. Each event selector can specify up to 50 S3 buckets and optional bucket prefixes.

I set this up, opened my bucket in the S3 Console, uploaded a file, and took a look at one of the entries in the trail. Here’s what it looked like:

{
  "eventVersion": "1.05",
  "userIdentity": {
    "type": "Root",
    "principalId": "99999999999",
    "arn": "arn:aws:iam::99999999999:root",
    "accountId": "99999999999",
    "username": "jbarr",
    "sessionContext": {
      "attributes": {
        "creationDate": "2016-11-15T17:55:17Z",
        "mfaAuthenticated": "false"
      }
    }
  },
  "eventTime": "2016-11-15T23:02:12Z",
  "eventSource": "s3.amazonaws.com",
  "eventName": "PutObject",
  "awsRegion": "us-east-1",
  "sourceIPAddress": "72.21.196.67",
  "userAgent": "[S3Console/0.4]",
  "requestParameters": {
    "X-Amz-Date": "20161115T230211Z",
    "bucketName": "jbarr-s3-trail-demo",
    "X-Amz-Algorithm": "AWS4-HMAC-SHA256",
    "storageClass": "STANDARD",
    "cannedAcl": "private",
    "X-Amz-SignedHeaders": "Content-Type;Host;x-amz-acl;x-amz-storage-class",
    "X-Amz-Expires": "300",
    "key": "ie_sb_device_4.png"
  }

Then I create a simple Lambda function:

Next, I create a CloudWatch Events rule that matches the function name of interest (PutObject) and invokes my Lambda function (S3Watcher):

Now I upload some files to my bucket and check to see that my Lambda function has been invoked as expected:

I can also find the CloudWatch entry that contains the output from my Lambda function:

Pricing and Availability
Data events are recorded only for the S3 buckets that you specify, and are charged at the rate of $0.10 per 100,000 events. This feature is available in all commercial AWS Regions.

Jeff;

AWS Price Reduction – CloudWatch Metrics

by Jeff Barr | on | in Amazon CloudWatch, Price Reduction | | Comments

Back in 2011 I introduced you to Custom Metrics for CloudWatch and showed you how to publish them from your applications and scripts. At that time, the first ten custom metrics were free of charge and additional metrics were $0.50 per metric per month, regardless of the number of metrics that you published.

Today, I am happy to announce a price change and a quantity discount for CloudWatch metrics. Based on the number of metrics that you publish every month, you can realize savings of up to 96%. Here is the new pricing for the US East (Northern Virginia) Region (the first ten metrics are still free of charge):

Tier From To Price Per Metric
Per Month
Discount Over
Current Price
First 10,000 Metrics 0 10,000 $0.30 40%
Next 240,000 Metrics 10,001 250,000 $0.10 80%
Next 750,000 Metrics 250,001 1,000,000 $0.05 90%
All Remaining Metrics 1,000,001 $0.02 96%

If you have EC2 Detailed Monitoring enabled you will also see a price reduction with per-month charges reduced from $3.50 per instance per month to $2.10 or lower based on the volume tier. The new prices will take effect on December 1, 2016 with no effort on your part. At that time, the updated prices will be published on the CloudWatch Pricing page.

By the way, if you are using CloudWatch Metrics, be sure to take advantage of other recently announced features such as Extended Metrics Retention, the CloudWatch Plugin for Collectd, CloudWatch Dashboards, and the new Metrics-to-Logs navigation feature.

Jeff;

 

Amazon CloudWatch Update – Percentile Statistics and New Dashboard Widgets

by Jeff Barr | on | in Amazon CloudWatch, Launch | | Comments

There sure is a lot going on with Amazon CloudWatch these days! Earlier this month I showed you how to Jump From Metrics to Associated Logs and told you about Extended Metrics Retention and the User Interface Update.

Today we are improving CloudWatch yet again, adding percentile statistics and two new dashboard widgets. Time is super tight due to AWS re:Invent, so I’ll be brief!

Percentile Statistics
When you run a web site or a cloud application at scale, you need to make sure that you are delivering the expected level of performance to the vast majority of your customers. While it is always a good idea to watch the numerical averages, you may not be getting the whole picture. The average may mask some performance outliers and you might not be able to see, for example, that 1% of your customers are not having a good experience.

In order to understand and visualize performance and behavior in a way that properly conveys the customer experience, percentiles are a useful tool. For example, you can use percentiles to know that 99% of the requests to your web site are being satisfied within 1 second. At Amazon, we use percentiles extensively and now you can do the same. We prefix them with a “p” and express our goals and observed performance in terms of the p90, p99, and p100 (worst case) response times for sites and services. Over the years we have found that responses in the long tail (p99 and above) can be used to detect database hot spots and other trouble spots.

Percentiles are supported for EC2, RDS, and Kinesis as well as for newly created Elastic Load Balancers and Application Load Balancers. They are also available for custom metrics. You can display the percentiles in CloudWatch (including Custom Dashboards) and you can also set alarms.

Percentiles can be displayed in conjunction with other metrics. For example, the orange and green lines indicate p90 and p95 CPU Utilization:

You can set any desired percentile in the CloudWatch Console:

Read Elastic Load Balancing: Support for CloudWatch Percentile Metrics to learn more about how to use the new percentile metrics to gain additional visibility into the performance of your applications.

New Dashboard Widgets
You can now add Stacked Area and Number widgets to your CloudWatch Custom Dashboards:

Here’s a Stacked Area widget with my network traffic:

And here’s a Number widget with some EC2 and EBS metrics:

Available Now
These new features are now available in all AWS Regions and you can start using them today!

Jeff;

 

New – CloudWatch Events for EBS Snapshots

by Jeff Barr | on | in Amazon CloudWatch, Amazon Elastic Block Store, AWS Lambda, Launch | | Comments

Cloud computing can improve upon traditional IT operations by giving you the power to automate complex high-level operations that were formerly kept in a runbook or passed along as tribal knowledge. Far too many of these operations involve backup and recovery operations, especially in smaller and less mature organizations.

Many AWS customers make great use of Amazon Elastic Block Store (EBS) volumes, especially given the ease with which they can generate and manage snapshot backups. They are also copying snapshots between regions on a regular basis for disaster recovery and other operational reasons.

Today we are bringing the benefits of automation to EBS with the addition of new CloudWatch Events for EBS snapshots. You can use these events to add additional automation to your cloud-based backup environment. Here are the new events:

  • createSnapshot – Fired after the status of a newly created EBS snapshot changes to Complete.
  • copySnapshot – Fired after the status of a snapshot copy changes to Complete.
  • shareSnapshot – Fired after a snapshot is shared with your AWS account.

A lot of AWS customers monitor the status of their snapshots by making repeated calls to the DescribeSnapshots function and then stepping through the paginated output in order to locate a specific snapshot. These new events open the door to all sorts of event-driven automation, including the cross-region copy that I mentioned earlier.

Using Snapshot Events
In order to get a better understanding of how this feature helps to automate data backup workflows, I’ll create a workflow that copies a completed snapshot to another region. First, I’ll create an IAM policy that grants appropriate permissions. Then I will incorporate an AWS Lambda function (created by my colleagues) that takes action on the createSnapshot event. Finally, I’ll create a CloudWatch Events rule to capture the event and route it to the Lambda function.

I start out by creating an IAM role (CopySnapshotToRegion) with this policy:

Then I created a new Lambda function (you can find the code at Amazon CloudWatch Events for EBS):

Next, I hopped over to the CloudWatch Events Console, clicked on Create rule, and set it up to handle successful createSnapshot events:

And gave it a name:

To test it out, I create a new EBS snapshot in my source region:

The function was invoked as expected and the snapshot was copied to the target region within seconds (in practice, the copy time will depend on the size of the snapshot):

You can also use these events to make copies of snapshots that are shared with you from other accounts. Many AWS customers partition their usage across multiple accounts for various organizational and security reasons; take a look at our AWS Multiple Account Security Strategy to see our in-depth recommendations in this area. Here are two of the five models included therein:

Available Now
The new events are available  in the US East (Northern Virginia), US East (Ohio), US West (Northern California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), EU (Frankfurt), EU (Ireland), and South America (São Paulo) Regions and you can start using them today! Take a look and let me know what you come up with.

Jeff;

 

PS – If you are a developer, development manager, or a product manager and would like to build systems like this, check out the EBS Jobs page.

 

CloudWatch Update – Jump From Metrics to Associated Logs

by Jeff Barr | on | in Amazon CloudWatch | | Comments

A few years ago I showed you how to Store and Monitor OS & Application Log Files with Amazon CloudWatch. Many AWS customers now create filters for their logs, publish the results as CloudWatch metrics, and then raise alarms when something is amiss. For example, they watch their web server logs for 404 errors that denote bad inbound links and 503 errors that can indicate an overload condition.

While the monitoring and alarming are both wonderful tools for summarizing vast amounts of log data, sometimes you need to head in the opposite direction. Starting from an overview, you need to be able to quickly locate the log file entries that were identified by the filters and caused the alarms to fire. If you, like many of our customers, are running hundreds or thousands of instances and monitoring multiple types of log files, this can be statistically more difficult than finding a needle in a haystack.

Today we are launching a new CloudWatch option that will (so to speak) reduce the size of the haystack and make it a lot easier for you to find the needle!

Let’s say that I am looking at this graph and want to understand the spike in the ERROR metric at around 17:00:

I narrow down the time range with a click and a drag:

Then I click on the logs icon (it, along with the other icons, appears only when the mouse is over the graph), and select the log file of interest (ERROR):

CloudWatch opens in a second tab, with a view that shows the desired log files, pre-filtered to the desired time span. I can then expand an entry to see what’s going on (these particular errors were manufactured for demo purposes; they are not very exciting or detailed):

This feature works great for situations where filters on log files are used to publish metrics. However, what if I am looking at some CloudWatch system metrics that are not associated with a particular log file? I can follow the steps above, but select View logs in this time range from the menu:

I can see all of my CloudWatch Log Groups, filtered for the time range in the graph:

At this point I can use my knowledge of my application’s architecture to guide my decision-making and to help me to choose a Log Group to investigate. Once again, the events in the log group will be filtered so that only those in the time frame of interest will be visible. If a chart contains metrics in the Lambda namespace, links to the log group will be displayed even if no metric filters are in effect.

This new feature is available now and you can start using it today!

Jeff;

New – Sending Metrics for Amazon Simple Email Service (SES)

by Jeff Barr | on | in Amazon CloudWatch, Amazon Kinesis, Amazon SES, Launch | | Comments

Amazon Simple Email Service (SES) focuses on deliverability – getting email through to the intended recipients. In my launch blog post (Introducing the Amazon Simple Email Service), I noted that several factors influence delivery, including the level of trust that you have earned with multiple Internet Service Providers (ISPs) and the number of complaints and bounces that you generate.

Today we are launching a new set of sending metrics for SES. There are two aspects to this feature:

Event Stream – You can configure SES to publish a JSON-formatted record to Amazon Kinesis Firehose each time a significant event (sent, rejected, delivered, bounced, or complaint generated) occurs.

Metrics – You can configure SES to publish aggregate metrics to Amazon CloudWatch. You can add one or more tags to each message and use them as CloudWatch dimensions. Tagging messages gives you the power to track deliverability based on campaign, team, department, and so forth.  You can then use this information to fine-tune your messages and your email strategy.

To learn more, read about Email Sending Metrics on the Amazon Simple Email Service Blog.

Jeff;

Amazon CloudWatch Update – Extended Metrics Retention & User Interface Update

by Jeff Barr | on | in Amazon CloudWatch | | Comments

Amazon CloudWatch is a monitoring service for your AWS resources and for the applications that you run on AWS. It collects and tracks metrics, monitors log files, and allows you to set alarms and respond to changes in your AWS resources.

Today we are launching several important enhancements to CloudWatch:

  • Extended Metrics Retention – CloudWatch now stores all metrics for 15 months.
  • Simplified Metric Selection – The CloudWatch Console now makes it easier for you to find and select the metrics of interest.
  • Improved Metric Graphing – Graphing of selected metrics is now easier and more flexible.

Let’s take a look!

Extended Metrics Retention
When we launched CloudWatch back in 2009 (New Features for Amazon EC2: Elastic Load Balancing, Auto Scaling, and Amazon CloudWatch), system metrics were stored for 14 days. Later, when we gave you the ability to publish your own metrics to CloudWatch, the same retention period applied. Many AWS customers would like to access and visualize data over longer periods of time. They want to detect and understand seasonal factors, observe monthly growth trends, and perform year-over-year analysis.

In order to support these use cases (and many others that you’ll undoubtedly dream up), CloudWatch now stores all metrics for 15 months at no extra charge. In order to keep the overall volume of data reasonable, historical data is stored at a lower level of granularity, as follows:

  • One minute data points are available for 15 days.
  • Five minute data points are available for 63 days.
  • One hour data points are available for 455 days (15 months).

In order to allow you to understand the value of this new feature first-hand, you can immediately access three months of retained metrics. Over the course of the next 12 months, metrics will continue to be retained up until the point where a full 15 months of history are stored.

As a first-hand illustration of the value of extended metrics retention, this long-term view of the Duration of my AWS Lambda functions tells me that something unusual is happening every two weeks:

Simplified Metric Selection
In order to make it easier for you to find and select the metrics that you would like to examine and graph, the CloudWatch Console now includes a clean, clutter-free card style view with a clean demarcation between AWS metrics and custom metrics:

The next step is to search for metrics of interest. For example, I’m interested in metrics with “CPU” in the name:

From there I can drill down, select a metric, and graph it. Here’s the CPU Utilization metric across all of my EC2 instances:

I can also show multiple metrics on the same graph:

This is not as illuminating as I would like; I’ll show you how to fix that in just a minute.

Improved Metric Graphing
With metrics now stored for 15 months and easier to find and to select, the next step is to make them easier to comprehend. This is where the new tabs (Graphed metrics and Graph options) come in to play.

The Graphed metrics tab gives me detailed control over each metric:

The Actions column on the right lets me create alarms, duplicate a metric, or to delete it from the graph.

I can click on any of the entries in the Statistic column to make a change:

The ability to duplicate a metric and then change a statistic lets me compare, for example, maximum and average CPU Utilization (I also changed the Period to 6 hours):

Now let’s take a look at the Y Axis control on the Graph options tab. In the last section I graphed seven EC2 metrics at the same time. Due to the variation in range, the graph was not as informative as it could be. I can now choose independent ranges for the left and right Y axes, and then assign metrics to the left or to the right:

I can edit the label for a metric by clicking on it:
I can also choose to set ranges for the left and the right Y axes:

I can use the new date picker to choose the time frame of interest to me. I can look at metrics for a desired absolute date range, or relative to the current date. Here’s how I choose an absolute date range:

And here’s how I choose a relative one:

I can rename a graph by clicking on the current name and entering a new one:

After I have fine-tuned the graph to my liking I can add it to an existing dashboard or create a new one:

Here’s one of my existing dashboards with the new graph at the bottom:

Available Now
The extended metric storage and all of the features that I described above are now available in the US East (Northern Virginia), US West (Oregon), US West (Northern California), EU (Ireland), EU (Frankfurt), South America (São Paulo), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Mumbai), and Asia Pacific (Sydney) Regions.

As I noted before, there is no extra charge for the extended metric storage.

Jeff;

 

New – CloudWatch Plugin for collectd

by Jeff Barr | on | in Amazon CloudWatch, Amazon EC2 | | Comments

You have had the power to store your own business, application, and system metrics in Amazon CloudWatch for quite some time (see New – Custom Metrics for Amazon CloudWatch to learn more).  As I wrote way back in 2011 when I introduced this feature, “You can view graphs, set alarms, and initiate automated actions based on these metrics, just as you can for the metrics that CloudWatch already stores for your AWS resources.”

Today we are simplifying the process of collecting statistics from your system and getting them in to CloudWatch with the introduction of a new CloudWatch plugin for collectd. By combining collectd‘s ability to gather many different types of statistics with the CloudWatch features for storage, display, alerting, and alarming, you can become better informed about the state and performance of your EC2 instances and your on-premises hardware and the applications running on them. The plugin is being released as an open source project and we are looking forward to your pull requests.

The collectd daemon is written in C for performance and portability. It supports over one hundred plugins, allowing you to collect statistics on Apache and Nginx web server performance, memory usage, uptime, and much more.

Installation and Configuration
I installed and configured collectd and the new plugin on an EC2 instance in order to see it in action.

To get started I created an IAM Policy with permission to write metrics data to CloudWatch:

Then I created an IAM Role that allows EC2 (and hence the collectd code running on my instance) to use my Policy:

If I was planning to use the plugin to collect statistics from my on-premises servers or if my EC2 instances were already running, I could have skipped these steps, and created an IAM user with the appropriate permissions instead. Had I done this, I would have had to put the user’s credentials on the servers or instances.

With the Policy and the Role in place, I launched an EC2 instance and selected the Role:

I logged in and installed collectd:

$ sudo yum -y install collectd

Then I fetched the plugin and the install script, made the script executable, and ran it:

$ chmod a+x setup.py
$ sudo ./setup.py

I answered a few questions and the setup ran without incident, starting up collectd after configuring it:

Installing dependencies ... OK
Installing python dependencies ... OK
Copying plugin tar file ... OK
Extracting plugin ... OK
Moving to collectd plugins directory ... OK
Copying CloudWatch plugin include file ... OK

Choose AWS region for published metrics:
  1. Automatic [us-east-1]
  2. Custom
Enter choice [1]: 1

Choose hostname for published metrics:
  1. EC2 instance id [i-057d2ed2260c3e251]
  2. Custom
Enter choice [1]: 1

Choose authentication method:
  1. IAM Role [Collectd_PutMetricData]
  2. IAM User
Enter choice [1]: 1

Choose how to install CloudWatch plugin in collectd:
  1. Do not modify existing collectd configuration
  2. Add plugin to the existing configuration
Enter choice [2]: 2
Plugin configuration written successfully.
Stopping collectd process ... NOT OK
Starting collectd process ... OK
$

With collectd running and the plugin installed and configured, the next step was to decide on the statistics of interest and configure the plugin to publish them to CloudWatch (note that there is a per-metric cost so this is an important step).

The file /opt/collectd-plugins/cloudwatch/config/blocked_metrics contains a list of metrics that have been collected but not published to CloudWatch:

$ cat /opt/collectd-plugins/cloudwatch/config/blocked_metrics
# This file is automatically generated - do not modify this file.
# Use this file to find metrics to be added to the whitelist file instead.
cpu-0-cpu-user
cpu-0-cpu-nice
cpu-0-cpu-system
cpu-0-cpu-idle
cpu-0-cpu-wait
cpu-0-cpu-interrupt
cpu-0-cpu-softirq
cpu-0-cpu-steal
interface-lo-if_octets-
interface-lo-if_packets-
interface-lo-if_errors-
interface-eth0-if_octets-
interface-eth0-if_packets-
interface-eth0-if_errors-
memory--memory-used
load--load-
memory--memory-buffered
memory--memory-cached

I was interested in memory consumption so I added one line to /opt/collectd-plugins/cloudwatch/config/whitelist.conf:

memory--memory-.*

The collectd configuration file (/etc/collectd.conf) contains additional settings for collectd and the plugins. I did not need to make any changes to it.

I restarted collectd so that it would pick up the change:

$ sudo service collectd restart

I exercised my instance a bit in order to consume some memory, and then opened up the CloudWatch Console to locate and display my metrics:

This screenshot includes a preview of an upcoming enhancement to the CloudWatch Console; don’t worry if yours doesn’t look as cool (stay tuned for more information on this).

If I had been monitoring a production instance, I could have installed one or more of the collectd plugins. Here’s a list of what’s available on the Amazon Linux AMI:

$ sudo yum list | grep collectd
collectd.x86_64                        5.4.1-1.11.amzn1               @amzn-main
collectd-amqp.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-apache.x86_64                 5.4.1-1.11.amzn1               amzn-main
collectd-bind.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-curl.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-curl_xml.x86_64               5.4.1-1.11.amzn1               amzn-main
collectd-dbi.x86_64                    5.4.1-1.11.amzn1               amzn-main
collectd-dns.x86_64                    5.4.1-1.11.amzn1               amzn-main
collectd-email.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-generic-jmx.x86_64            5.4.1-1.11.amzn1               amzn-main
collectd-gmond.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-ipmi.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-iptables.x86_64               5.4.1-1.11.amzn1               amzn-main
collectd-ipvs.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-java.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-lvm.x86_64                    5.4.1-1.11.amzn1               amzn-main
collectd-memcachec.x86_64              5.4.1-1.11.amzn1               amzn-main
collectd-mysql.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-netlink.x86_64                5.4.1-1.11.amzn1               amzn-main
collectd-nginx.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-notify_email.x86_64           5.4.1-1.11.amzn1               amzn-main
collectd-postgresql.x86_64             5.4.1-1.11.amzn1               amzn-main
collectd-rrdcached.x86_64              5.4.1-1.11.amzn1               amzn-main
collectd-rrdtool.x86_64                5.4.1-1.11.amzn1               amzn-main
collectd-snmp.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-varnish.x86_64                5.4.1-1.11.amzn1               amzn-main
collectd-web.x86_64                    5.4.1-1.11.amzn1               amzn-main

Things to Know
If you are using version 5.5 or newer of collectd, four metrics are now published by default:

  • df-root-percent_bytes-used – disk utilization
  • memory–percent-used – memory utilization
  • swap–percent-used – swap utilization
  • cpu–percent-active – cpu utilization

You can remove these from the whitelist.conf file if you don’t want them to be published.

The primary repositories for the Amazon Linux AMI, Ubuntu, RHEL, and CentOS currently provide older versions of collectd; please be aware of this change in the default behavior if you install from a custom repo or build from source.

Lots More
There’s quite a bit more than I had time to show you. You can  install more plugins and then configure whitelist.conf to publish even more metrics to CloudWatch. You can create CloudWatch Alarms, set up Custom Dashboards, and more.

To get started, visit AWS Labs on GitHub and download the CloudWatch plugin for collectd.

Jeff;