Category: Amazon CloudWatch


EC2 Run Command is Now a CloudWatch Events Target

by Jeff Barr | on | in Amazon CloudWatch, EC2 Systems Manager | | Comments

Ok, time for another peanut butter and chocolate post! Let’s combine EC2 Run Command (New EC2 Run Command – Remote Instance Management at Scale) and CloudWatch Events (New CloudWatch Events – Track and Respond to Changes to Your AWS Resources) and see what we get.

EC2 Run Command is part of EC2 Systems Manager. It allows you to operate on collections of EC2 instances and on-premises servers reliably and at scale, in a controlled and selective fashion. You can run scripts, install software, collect metrics and log files, manage patches, and much more, on both Windows and Linux.

CloudWatch Events gives you the ability to track changes to AWS resources in near real-time. You get a stream of system events that you can easily route to one or more targets including AWS Lambda functions, Amazon Kinesis streams, Amazon SNS topics, and built-in EC2 and EBS targets.

Better Together
Today we are bringing these two services together. You can now create CloudWatch Events rules that use EC2 Run Command to perform actions on EC2 instances or on-premises servers. This opens the door to all sorts of interesting ideas; here are a few that I came up with:

Final Log Collection – Collect application or system logs from instances that are being shut down (either manually or as a result of a scale-in operation initiated by Auto Scaling).

Error Log Condition – Collect logs after an application crash or a security incident.

Instance Setup – After an instance has started, download & install applications, set parameters and configurations, and launch processes.

Configuration Updates – When a config file is changed in S3, install it on applicable instances (perhaps determined by tags). For example, you could install an updated Apache web server config file on a set of properly tagged instances, and then restart the server so that it picks up the changes. Or, update an instance-level firewall each time the AWS IP Address Ranges are updated.

EBS Snapshot Testing and Tracking – After a fresh snapshot has been created, mount it on a test instance, check the filesystem for errors, and then index the files in the snapshot.

Instance Coordination – Every time an instance is launched or terminated, inform the others so that they can update internal tracking information or rebalance their workloads.

I’m sure that you have some more interesting ideas; please feel free to share them in the comments.

Time for Action!
Let’s set this up. Suppose I want to run a specific Linux shell command every time Auto Scaling adds another instance to an Auto Scaling Group.

I start by opening the CloudWatch Events Console and clicking on Create rule:

I configure my Event Source to be my Auto Scaling Group (AS-Main-1), and indicate that I want to take action when EC2 instances are launched successfully:

Then I set up the target. I choose SSM Run Command, pick the AWS-RunShellScript document, and indicate that I want the command to be run on the instances that are tagged as coming from my Auto Scaling group:

Then I click on Configure details, give my rule a name and a description, and click on Create rule:

With everything set up, the command service httpd start will be run on each instance launched as a result of a scale out operation.

Available Now
This new feature is available now and you can start using it today.

Jeff;

 

CloudTrail Update – Capture and Process Amazon S3 Object-Level API Activity

by Jeff Barr | on | in Amazon CloudTrail, Amazon CloudWatch, Amazon S3, AWS Lambda | | Comments

I would like to show you how several different AWS services can be used together to address a challenge faced by many of our customers. Along the way I will introduce you to a new AWS CloudTrail feature that launches today and show you how you can use it in conjunction with CloudWatch Events.

The Challenge
Our customers store many different types of mission-critical data in Amazon Simple Storage Service (S3) and want to be able to track object-level activity on their data. While some of this activity is captured and stored in the S3 access logs, the level of detail is limited and log delivery can take several hours. Customers, particularly in financial services and other regulated industries, are asking for additional detail, delivered on a more timely basis. For example, they would like to be able to know when a particular IAM user accesses sensitive information stored in a specific part of an S3 bucket.

In order to meet the needs of these customers, we are now giving CloudTrail the power to capture object-level API activity on S3 objects, which we call Data events (the original CloudTrail events are now called Management events). Data events include “read” operations such as GET, HEAD, and Get Object ACL as well as “write” operations such as PUT and POST. The level of detail captured for these operations is intended to provide support for many types of security, auditing, governance, and compliance use cases. For example, it can be used to scan newly uploaded data for Personally Identifiable Information (PII), audit attempts to access data in a protected bucket, or to verify that the desired access policies are in effect.

Processing Object-Level API Activity
Putting this all together, we can easily set up a Lambda function that will take a custom action whenever an S3 operation takes place on any object within a selected bucket or a selected folder within a bucket.

Before starting on this post, I created a new CloudTrail trail called jbarr-s3-trail:

I want to use this trail to log object-level activity on one of my S3 buckets (jbarr-s3-trail-demo). In order to do this I need to add an event selector to the trail. The selector is specific to S3, and allows me to focus on logging the events that are of interest to me. Event selectors are a new CloudTrail feature and are being introduced as part of today’s launch, in case you were wondering.

I indicate that I want to log both read and write events, and specify the bucket of interest. I can limit the events to part of the bucket by specifying a prefix, and I can also specify multiple buckets. I can also control the logging of Management events:

CloudTrail supports up to 5 event selectors per trail. Each event selector can specify up to 50 S3 buckets and optional bucket prefixes.

I set this up, opened my bucket in the S3 Console, uploaded a file, and took a look at one of the entries in the trail. Here’s what it looked like:

{
  "eventVersion": "1.05",
  "userIdentity": {
    "type": "Root",
    "principalId": "99999999999",
    "arn": "arn:aws:iam::99999999999:root",
    "accountId": "99999999999",
    "username": "jbarr",
    "sessionContext": {
      "attributes": {
        "creationDate": "2016-11-15T17:55:17Z",
        "mfaAuthenticated": "false"
      }
    }
  },
  "eventTime": "2016-11-15T23:02:12Z",
  "eventSource": "s3.amazonaws.com",
  "eventName": "PutObject",
  "awsRegion": "us-east-1",
  "sourceIPAddress": "72.21.196.67",
  "userAgent": "[S3Console/0.4]",
  "requestParameters": {
    "X-Amz-Date": "20161115T230211Z",
    "bucketName": "jbarr-s3-trail-demo",
    "X-Amz-Algorithm": "AWS4-HMAC-SHA256",
    "storageClass": "STANDARD",
    "cannedAcl": "private",
    "X-Amz-SignedHeaders": "Content-Type;Host;x-amz-acl;x-amz-storage-class",
    "X-Amz-Expires": "300",
    "key": "ie_sb_device_4.png"
  }

Then I create a simple Lambda function:

Next, I create a CloudWatch Events rule that matches the function name of interest (PutObject) and invokes my Lambda function (S3Watcher):

Now I upload some files to my bucket and check to see that my Lambda function has been invoked as expected:

I can also find the CloudWatch entry that contains the output from my Lambda function:

Pricing and Availability
Data events are recorded only for the S3 buckets that you specify, and are charged at the rate of $0.10 per 100,000 events. This feature is available in all commercial AWS Regions.

Jeff;

AWS Price Reduction – CloudWatch Metrics

by Jeff Barr | on | in Amazon CloudWatch, Price Reduction | | Comments

Back in 2011 I introduced you to Custom Metrics for CloudWatch and showed you how to publish them from your applications and scripts. At that time, the first ten custom metrics were free of charge and additional metrics were $0.50 per metric per month, regardless of the number of metrics that you published.

Today, I am happy to announce a price change and a quantity discount for CloudWatch metrics. Based on the number of metrics that you publish every month, you can realize savings of up to 96%. Here is the new pricing for the US East (Northern Virginia) Region (the first ten metrics are still free of charge):

Tier From To Price Per Metric
Per Month
Discount Over
Current Price
First 10,000 Metrics 0 10,000 $0.30 40%
Next 240,000 Metrics 10,001 250,000 $0.10 80%
Next 750,000 Metrics 250,001 1,000,000 $0.05 90%
All Remaining Metrics 1,000,001 $0.02 96%

If you have EC2 Detailed Monitoring enabled you will also see a price reduction with per-month charges reduced from $3.50 per instance per month to $2.10 or lower based on the volume tier. The new prices will take effect on December 1, 2016 with no effort on your part. At that time, the updated prices will be published on the CloudWatch Pricing page.

By the way, if you are using CloudWatch Metrics, be sure to take advantage of other recently announced features such as Extended Metrics Retention, the CloudWatch Plugin for Collectd, CloudWatch Dashboards, and the new Metrics-to-Logs navigation feature.

Jeff;

 

Amazon CloudWatch Update – Percentile Statistics and New Dashboard Widgets

by Jeff Barr | on | in Amazon CloudWatch, Launch | | Comments

There sure is a lot going on with Amazon CloudWatch these days! Earlier this month I showed you how to Jump From Metrics to Associated Logs and told you about Extended Metrics Retention and the User Interface Update.

Today we are improving CloudWatch yet again, adding percentile statistics and two new dashboard widgets. Time is super tight due to AWS re:Invent, so I’ll be brief!

Percentile Statistics
When you run a web site or a cloud application at scale, you need to make sure that you are delivering the expected level of performance to the vast majority of your customers. While it is always a good idea to watch the numerical averages, you may not be getting the whole picture. The average may mask some performance outliers and you might not be able to see, for example, that 1% of your customers are not having a good experience.

In order to understand and visualize performance and behavior in a way that properly conveys the customer experience, percentiles are a useful tool. For example, you can use percentiles to know that 99% of the requests to your web site are being satisfied within 1 second. At Amazon, we use percentiles extensively and now you can do the same. We prefix them with a “p” and express our goals and observed performance in terms of the p90, p99, and p100 (worst case) response times for sites and services. Over the years we have found that responses in the long tail (p99 and above) can be used to detect database hot spots and other trouble spots.

Percentiles are supported for EC2, RDS, and Kinesis as well as for newly created Elastic Load Balancers and Application Load Balancers. They are also available for custom metrics. You can display the percentiles in CloudWatch (including Custom Dashboards) and you can also set alarms.

Percentiles can be displayed in conjunction with other metrics. For example, the orange and green lines indicate p90 and p95 CPU Utilization:

You can set any desired percentile in the CloudWatch Console:

Read Elastic Load Balancing: Support for CloudWatch Percentile Metrics to learn more about how to use the new percentile metrics to gain additional visibility into the performance of your applications.

New Dashboard Widgets
You can now add Stacked Area and Number widgets to your CloudWatch Custom Dashboards:

Here’s a Stacked Area widget with my network traffic:

And here’s a Number widget with some EC2 and EBS metrics:

Available Now
These new features are now available in all AWS Regions and you can start using them today!

Jeff;

 

New – CloudWatch Events for EBS Snapshots

by Jeff Barr | on | in Amazon CloudWatch, Amazon Elastic Block Store, AWS Lambda, Launch | | Comments

Cloud computing can improve upon traditional IT operations by giving you the power to automate complex high-level operations that were formerly kept in a runbook or passed along as tribal knowledge. Far too many of these operations involve backup and recovery operations, especially in smaller and less mature organizations.

Many AWS customers make great use of Amazon Elastic Block Store (EBS) volumes, especially given the ease with which they can generate and manage snapshot backups. They are also copying snapshots between regions on a regular basis for disaster recovery and other operational reasons.

Today we are bringing the benefits of automation to EBS with the addition of new CloudWatch Events for EBS snapshots. You can use these events to add additional automation to your cloud-based backup environment. Here are the new events:

  • createSnapshot – Fired after the status of a newly created EBS snapshot changes to Complete.
  • copySnapshot – Fired after the status of a snapshot copy changes to Complete.
  • shareSnapshot – Fired after a snapshot is shared with your AWS account.

A lot of AWS customers monitor the status of their snapshots by making repeated calls to the DescribeSnapshots function and then stepping through the paginated output in order to locate a specific snapshot. These new events open the door to all sorts of event-driven automation, including the cross-region copy that I mentioned earlier.

Using Snapshot Events
In order to get a better understanding of how this feature helps to automate data backup workflows, I’ll create a workflow that copies a completed snapshot to another region. First, I’ll create an IAM policy that grants appropriate permissions. Then I will incorporate an AWS Lambda function (created by my colleagues) that takes action on the createSnapshot event. Finally, I’ll create a CloudWatch Events rule to capture the event and route it to the Lambda function.

I start out by creating an IAM role (CopySnapshotToRegion) with this policy:

Then I created a new Lambda function (you can find the code at Amazon CloudWatch Events for EBS):

Next, I hopped over to the CloudWatch Events Console, clicked on Create rule, and set it up to handle successful createSnapshot events:

And gave it a name:

To test it out, I create a new EBS snapshot in my source region:

The function was invoked as expected and the snapshot was copied to the target region within seconds (in practice, the copy time will depend on the size of the snapshot):

You can also use these events to make copies of snapshots that are shared with you from other accounts. Many AWS customers partition their usage across multiple accounts for various organizational and security reasons; take a look at our AWS Multiple Account Security Strategy to see our in-depth recommendations in this area. Here are two of the five models included therein:

Available Now
The new events are available  in the US East (Northern Virginia), US East (Ohio), US West (Northern California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), EU (Frankfurt), EU (Ireland), and South America (São Paulo) Regions and you can start using them today! Take a look and let me know what you come up with.

Jeff;

 

PS – If you are a developer, development manager, or a product manager and would like to build systems like this, check out the EBS Jobs page.

 

CloudWatch Update – Jump From Metrics to Associated Logs

by Jeff Barr | on | in Amazon CloudWatch | | Comments

A few years ago I showed you how to Store and Monitor OS & Application Log Files with Amazon CloudWatch. Many AWS customers now create filters for their logs, publish the results as CloudWatch metrics, and then raise alarms when something is amiss. For example, they watch their web server logs for 404 errors that denote bad inbound links and 503 errors that can indicate an overload condition.

While the monitoring and alarming are both wonderful tools for summarizing vast amounts of log data, sometimes you need to head in the opposite direction. Starting from an overview, you need to be able to quickly locate the log file entries that were identified by the filters and caused the alarms to fire. If you, like many of our customers, are running hundreds or thousands of instances and monitoring multiple types of log files, this can be statistically more difficult than finding a needle in a haystack.

Today we are launching a new CloudWatch option that will (so to speak) reduce the size of the haystack and make it a lot easier for you to find the needle!

Let’s say that I am looking at this graph and want to understand the spike in the ERROR metric at around 17:00:

I narrow down the time range with a click and a drag:

Then I click on the logs icon (it, along with the other icons, appears only when the mouse is over the graph), and select the log file of interest (ERROR):

CloudWatch opens in a second tab, with a view that shows the desired log files, pre-filtered to the desired time span. I can then expand an entry to see what’s going on (these particular errors were manufactured for demo purposes; they are not very exciting or detailed):

This feature works great for situations where filters on log files are used to publish metrics. However, what if I am looking at some CloudWatch system metrics that are not associated with a particular log file? I can follow the steps above, but select View logs in this time range from the menu:

I can see all of my CloudWatch Log Groups, filtered for the time range in the graph:

At this point I can use my knowledge of my application’s architecture to guide my decision-making and to help me to choose a Log Group to investigate. Once again, the events in the log group will be filtered so that only those in the time frame of interest will be visible. If a chart contains metrics in the Lambda namespace, links to the log group will be displayed even if no metric filters are in effect.

This new feature is available now and you can start using it today!

Jeff;

New – Sending Metrics for Amazon Simple Email Service (SES)

by Jeff Barr | on | in Amazon CloudWatch, Amazon Kinesis, Amazon SES, Launch | | Comments

Amazon Simple Email Service (SES) focuses on deliverability – getting email through to the intended recipients. In my launch blog post (Introducing the Amazon Simple Email Service), I noted that several factors influence delivery, including the level of trust that you have earned with multiple Internet Service Providers (ISPs) and the number of complaints and bounces that you generate.

Today we are launching a new set of sending metrics for SES. There are two aspects to this feature:

Event Stream – You can configure SES to publish a JSON-formatted record to Amazon Kinesis Firehose each time a significant event (sent, rejected, delivered, bounced, or complaint generated) occurs.

Metrics – You can configure SES to publish aggregate metrics to Amazon CloudWatch. You can add one or more tags to each message and use them as CloudWatch dimensions. Tagging messages gives you the power to track deliverability based on campaign, team, department, and so forth.  You can then use this information to fine-tune your messages and your email strategy.

To learn more, read about Email Sending Metrics on the Amazon Simple Email Service Blog.

Jeff;

Amazon CloudWatch Update – Extended Metrics Retention & User Interface Update

by Jeff Barr | on | in Amazon CloudWatch | | Comments

Amazon CloudWatch is a monitoring service for your AWS resources and for the applications that you run on AWS. It collects and tracks metrics, monitors log files, and allows you to set alarms and respond to changes in your AWS resources.

Today we are launching several important enhancements to CloudWatch:

  • Extended Metrics Retention – CloudWatch now stores all metrics for 15 months.
  • Simplified Metric Selection – The CloudWatch Console now makes it easier for you to find and select the metrics of interest.
  • Improved Metric Graphing – Graphing of selected metrics is now easier and more flexible.

Let’s take a look!

Extended Metrics Retention
When we launched CloudWatch back in 2009 (New Features for Amazon EC2: Elastic Load Balancing, Auto Scaling, and Amazon CloudWatch), system metrics were stored for 14 days. Later, when we gave you the ability to publish your own metrics to CloudWatch, the same retention period applied. Many AWS customers would like to access and visualize data over longer periods of time. They want to detect and understand seasonal factors, observe monthly growth trends, and perform year-over-year analysis.

In order to support these use cases (and many others that you’ll undoubtedly dream up), CloudWatch now stores all metrics for 15 months at no extra charge. In order to keep the overall volume of data reasonable, historical data is stored at a lower level of granularity, as follows:

  • One minute data points are available for 15 days.
  • Five minute data points are available for 63 days.
  • One hour data points are available for 455 days (15 months).

In order to allow you to understand the value of this new feature first-hand, you can immediately access three months of retained metrics. Over the course of the next 12 months, metrics will continue to be retained up until the point where a full 15 months of history are stored.

As a first-hand illustration of the value of extended metrics retention, this long-term view of the Duration of my AWS Lambda functions tells me that something unusual is happening every two weeks:

Simplified Metric Selection
In order to make it easier for you to find and select the metrics that you would like to examine and graph, the CloudWatch Console now includes a clean, clutter-free card style view with a clean demarcation between AWS metrics and custom metrics:

The next step is to search for metrics of interest. For example, I’m interested in metrics with “CPU” in the name:

From there I can drill down, select a metric, and graph it. Here’s the CPU Utilization metric across all of my EC2 instances:

I can also show multiple metrics on the same graph:

This is not as illuminating as I would like; I’ll show you how to fix that in just a minute.

Improved Metric Graphing
With metrics now stored for 15 months and easier to find and to select, the next step is to make them easier to comprehend. This is where the new tabs (Graphed metrics and Graph options) come in to play.

The Graphed metrics tab gives me detailed control over each metric:

The Actions column on the right lets me create alarms, duplicate a metric, or to delete it from the graph.

I can click on any of the entries in the Statistic column to make a change:

The ability to duplicate a metric and then change a statistic lets me compare, for example, maximum and average CPU Utilization (I also changed the Period to 6 hours):

Now let’s take a look at the Y Axis control on the Graph options tab. In the last section I graphed seven EC2 metrics at the same time. Due to the variation in range, the graph was not as informative as it could be. I can now choose independent ranges for the left and right Y axes, and then assign metrics to the left or to the right:

I can edit the label for a metric by clicking on it:
I can also choose to set ranges for the left and the right Y axes:

I can use the new date picker to choose the time frame of interest to me. I can look at metrics for a desired absolute date range, or relative to the current date. Here’s how I choose an absolute date range:

And here’s how I choose a relative one:

I can rename a graph by clicking on the current name and entering a new one:

After I have fine-tuned the graph to my liking I can add it to an existing dashboard or create a new one:

Here’s one of my existing dashboards with the new graph at the bottom:

Available Now
The extended metric storage and all of the features that I described above are now available in the US East (Northern Virginia), US West (Oregon), US West (Northern California), EU (Ireland), EU (Frankfurt), South America (São Paulo), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Mumbai), and Asia Pacific (Sydney) Regions.

As I noted before, there is no extra charge for the extended metric storage.

Jeff;

 

New – CloudWatch Plugin for collectd

by Jeff Barr | on | in Amazon CloudWatch, Amazon EC2 | | Comments

You have had the power to store your own business, application, and system metrics in Amazon CloudWatch for quite some time (see New – Custom Metrics for Amazon CloudWatch to learn more).  As I wrote way back in 2011 when I introduced this feature, “You can view graphs, set alarms, and initiate automated actions based on these metrics, just as you can for the metrics that CloudWatch already stores for your AWS resources.”

Today we are simplifying the process of collecting statistics from your system and getting them in to CloudWatch with the introduction of a new CloudWatch plugin for collectd. By combining collectd‘s ability to gather many different types of statistics with the CloudWatch features for storage, display, alerting, and alarming, you can become better informed about the state and performance of your EC2 instances and your on-premises hardware and the applications running on them. The plugin is being released as an open source project and we are looking forward to your pull requests.

The collectd daemon is written in C for performance and portability. It supports over one hundred plugins, allowing you to collect statistics on Apache and Nginx web server performance, memory usage, uptime, and much more.

Installation and Configuration
I installed and configured collectd and the new plugin on an EC2 instance in order to see it in action.

To get started I created an IAM Policy with permission to write metrics data to CloudWatch:

Then I created an IAM Role that allows EC2 (and hence the collectd code running on my instance) to use my Policy:

If I was planning to use the plugin to collect statistics from my on-premises servers or if my EC2 instances were already running, I could have skipped these steps, and created an IAM user with the appropriate permissions instead. Had I done this, I would have had to put the user’s credentials on the servers or instances.

With the Policy and the Role in place, I launched an EC2 instance and selected the Role:

I logged in and installed collectd:

$ sudo yum -y install collectd

Then I fetched the plugin and the install script, made the script executable, and ran it:

$ chmod a+x setup.py
$ sudo ./setup.py

I answered a few questions and the setup ran without incident, starting up collectd after configuring it:

Installing dependencies ... OK
Installing python dependencies ... OK
Copying plugin tar file ... OK
Extracting plugin ... OK
Moving to collectd plugins directory ... OK
Copying CloudWatch plugin include file ... OK

Choose AWS region for published metrics:
  1. Automatic [us-east-1]
  2. Custom
Enter choice [1]: 1

Choose hostname for published metrics:
  1. EC2 instance id [i-057d2ed2260c3e251]
  2. Custom
Enter choice [1]: 1

Choose authentication method:
  1. IAM Role [Collectd_PutMetricData]
  2. IAM User
Enter choice [1]: 1

Choose how to install CloudWatch plugin in collectd:
  1. Do not modify existing collectd configuration
  2. Add plugin to the existing configuration
Enter choice [2]: 2
Plugin configuration written successfully.
Stopping collectd process ... NOT OK
Starting collectd process ... OK
$

With collectd running and the plugin installed and configured, the next step was to decide on the statistics of interest and configure the plugin to publish them to CloudWatch (note that there is a per-metric cost so this is an important step).

The file /opt/collectd-plugins/cloudwatch/config/blocked_metrics contains a list of metrics that have been collected but not published to CloudWatch:

$ cat /opt/collectd-plugins/cloudwatch/config/blocked_metrics
# This file is automatically generated - do not modify this file.
# Use this file to find metrics to be added to the whitelist file instead.
cpu-0-cpu-user
cpu-0-cpu-nice
cpu-0-cpu-system
cpu-0-cpu-idle
cpu-0-cpu-wait
cpu-0-cpu-interrupt
cpu-0-cpu-softirq
cpu-0-cpu-steal
interface-lo-if_octets-
interface-lo-if_packets-
interface-lo-if_errors-
interface-eth0-if_octets-
interface-eth0-if_packets-
interface-eth0-if_errors-
memory--memory-used
load--load-
memory--memory-buffered
memory--memory-cached

I was interested in memory consumption so I added one line to /opt/collectd-plugins/cloudwatch/config/whitelist.conf:

memory--memory-.*

The collectd configuration file (/etc/collectd.conf) contains additional settings for collectd and the plugins. I did not need to make any changes to it.

I restarted collectd so that it would pick up the change:

$ sudo service collectd restart

I exercised my instance a bit in order to consume some memory, and then opened up the CloudWatch Console to locate and display my metrics:

This screenshot includes a preview of an upcoming enhancement to the CloudWatch Console; don’t worry if yours doesn’t look as cool (stay tuned for more information on this).

If I had been monitoring a production instance, I could have installed one or more of the collectd plugins. Here’s a list of what’s available on the Amazon Linux AMI:

$ sudo yum list | grep collectd
collectd.x86_64                        5.4.1-1.11.amzn1               @amzn-main
collectd-amqp.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-apache.x86_64                 5.4.1-1.11.amzn1               amzn-main
collectd-bind.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-curl.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-curl_xml.x86_64               5.4.1-1.11.amzn1               amzn-main
collectd-dbi.x86_64                    5.4.1-1.11.amzn1               amzn-main
collectd-dns.x86_64                    5.4.1-1.11.amzn1               amzn-main
collectd-email.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-generic-jmx.x86_64            5.4.1-1.11.amzn1               amzn-main
collectd-gmond.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-ipmi.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-iptables.x86_64               5.4.1-1.11.amzn1               amzn-main
collectd-ipvs.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-java.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-lvm.x86_64                    5.4.1-1.11.amzn1               amzn-main
collectd-memcachec.x86_64              5.4.1-1.11.amzn1               amzn-main
collectd-mysql.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-netlink.x86_64                5.4.1-1.11.amzn1               amzn-main
collectd-nginx.x86_64                  5.4.1-1.11.amzn1               amzn-main
collectd-notify_email.x86_64           5.4.1-1.11.amzn1               amzn-main
collectd-postgresql.x86_64             5.4.1-1.11.amzn1               amzn-main
collectd-rrdcached.x86_64              5.4.1-1.11.amzn1               amzn-main
collectd-rrdtool.x86_64                5.4.1-1.11.amzn1               amzn-main
collectd-snmp.x86_64                   5.4.1-1.11.amzn1               amzn-main
collectd-varnish.x86_64                5.4.1-1.11.amzn1               amzn-main
collectd-web.x86_64                    5.4.1-1.11.amzn1               amzn-main

Things to Know
If you are using version 5.5 or newer of collectd, four metrics are now published by default:

  • df-root-percent_bytes-used – disk utilization
  • memory–percent-used – memory utilization
  • swap–percent-used – swap utilization
  • cpu–percent-active – cpu utilization

You can remove these from the whitelist.conf file if you don’t want them to be published.

The primary repositories for the Amazon Linux AMI, Ubuntu, RHEL, and CentOS currently provide older versions of collectd; please be aware of this change in the default behavior if you install from a custom repo or build from source.

Lots More
There’s quite a bit more than I had time to show you. You can  install more plugins and then configure whitelist.conf to publish even more metrics to CloudWatch. You can create CloudWatch Alarms, set up Custom Dashboards, and more.

To get started, visit AWS Labs on GitHub and download the CloudWatch plugin for collectd.

Jeff;

 

Improvements to CloudWatch Logs & Dashboards

by Jeff Barr | on | in Amazon CloudWatch, Launch |

Amazon CloudWatch helps you to see, diagnose, react to, and resolve issues that arise in your AWS infrastructure and in the applications that you run on AWS. Today, I would like to talk about several usability and functionality improvements to CloudWatch Logs (Store and Monitor OS & Application Log Files with Amazon CloudWatch) and to CloudWatch Dashboards (CloudWatch Dashboards – Create & Use Customized Metrics Views).

Usability Improvements to CloudWatch Logs
CloudWatch Logs is a highly available, scalable, durable, and secure service to manage your operating system and application log files. It allows you to ingest, store, filter, search, and archive the logs, reducing your operational burden and allowing you to focus on your application and your business.

In order to help you to stay efficient and productive even as the number and size of your logs grows, we have made several usability improvements to the CloudWatch Logs Console:

  • Improved formatting for log data.
  • Simplified access to lengthy log files.
  • Easier searching within a log group.
  • Simplified collaboration around log files.
  • Better searching within a specific time frame.

Prior to today’s launch we also made some improvements to the CloudWatch Dashboards:

  • Full screen mode.
  • Dark theme.
  • Control over range of the Y axis on charts.
  • Simplified renaming of charts.
  • Persistent storage of chart settings.

CloudWatch Logs Console in Action
Let’s take a look at each of these improvements!

Open up the CloudWatch Logs Console, click on a Log Group, and then on a Log Stream within the group. Find the View options menu on the right:

Click on Expand all in order to see the log messages in expanded, multi-line form like this:

You can also Switch to text view in order to see the logs in their unadorned, plain-text form:

We have also improved the display of log data across all streams within a log group. Once you select a Log Group and click Search Events you can see the log data from all streams with that log group. For example, I can easily identify the Billed Duration for multiple invocations of a single Lambda function:

Even better, we have replaced the original paginated view with an infinite scroll bar. You can now scroll to your heart’s content through log files of any length:

You can now refine your search to a specific time frame or to a custom date range with a single click, like this:

If you are working as part of a team, you can now share the URL of your log analysis session. The URL includes the search parameters and filters, and includes a fragment that looks like this:

group=<log_group_name>_log;stream=<log_stream_name>;filter=<filter_parameter>;start=PT<time_frame>

These improvements to the CloudWatch Logs Console are available now and you can start using them today. To learn more, read Getting Started with CloudWatch Logs.

Recent Improvements to the CloudWatch Dashboards
You may have already noticed the improvements that we recently made to the CloudWatch Dashboards. First, there’s a new full screen mode for Dashboards, accessible by clicking on Enter full screen in the Actions menu:

Once you are in full screen mode, you can click on Dark to switch to the new, night-owl-friendly dark theme:

Here’s a simple Redis dashboard in full screen mode using the dark theme:

Sometimes you want to have more control over how a chart is displayed on your dashboard. As an example, outliers in your data may make your chart less readable, and you may want to keep the dashboard focused on a specific Y axis range. Here’s a chart where that’s the case; the outlier masks the trend that happened after the big spike:

To edit the Y axis, click on the tool selector and select Edit:

Choose Graph Options and then edit the values for the Y axis until you are satisfied with the appearance of the chart, then click on Update widget:

Here’s what the chart looks like after that:

Many of our customers wanted to be able to rename a chart without leaving the dashboard. You can now do that with a click (hover your mouse near the name and then click on the pencil):

Finally, CloudWatch now remembers the time range, timezone preference, refresh interval, and auto-refresh setting for each chart!

Amazon CloudWatch Partner Ecosystem
I’d like to wrap things up by sharing some of the great work that our partners are doing. The following partners are building value-added solutions on top of CloudWatch:

  • Datadog provides integrations to key items in your infrastructure, and gives you the ability to collaborate with your team directly when dealing with incidents.
  • Librato provides integrations across elements of your infrastructure, and supports composite metrics and mathematical transformations to time series data.
  • SignalFx helps provide you with instant visibility into your metrics, and focuses on data analytics and on delivering alerts on service-wide patterns.
  • Splunk offers a platform for operational intelligence that enables you to collect machine data and find insights.
  • Sumo Logic is a machine data analytics service for log management and time series metrics that helps you build, run and secure your applications.
  • CloudCheckr integrates regular and custom metrics across your environment to assist in cost, security, and performance management by identifying right-sizing opportunities and alerting for potential security and performance issues.

If you are a partner and offer something that belongs on this list, let me know and I’ll update it ASAP!

Jeff;