How to Monitor your Resources Effectively
This article is part of a technical content series crafted by AWS Startup Solutions Architects to help guide early stage startups in setting the foundations needed to start building quickly and easily. The series offers a high-level overview of the technical decisions startup founders need to make when getting off the ground, along with which AWS services are best suited to address those decisions.
One of the most important things for startups is to develop and release their Minimum Viable Product (MVP) and start getting feedback on the product as soon as possible. In the process of bringing the application to the market quickly, “Monitoring” is an area which sometimes gets ignored. Monitoring is critical for business continuity and to better understand the user experience. In this article, we discuss resource monitoring and how easy it is to achieve operational excellence by using a few simple AWS services.
As AWS Startup Solutions Architects, we have seen many companies focus on achieving operational excellence to build strong trust and offer a great user experience. The ability to run systems and gain insight into the operations helps you continuously improve supporting processes and procedures. It also enables you to act in real-time if you face any performance, configuration, or security issues.
We recommend that initially, startups use monitoring in a reactive mode. You can have visibility in the system through a set of metrics and a dashboard. As you grow, move to a proactive approach of a self-healing system to achieve that operational excellence.
Before we talk about the specific services, it’s important to first understand what resource monitoring is. It refers to monitoring AWS resources like Amazon EC2, Amazon RDS, different container services, or even serverless services like AWS Lambda and Amazon DynamoDB, for latency, traffic, errors, and saturation. To give you an example, a successful startup using AWS Elastic Beanstalk to deploy its web application might need to scale its RDS instance if metrics like network throughput, CPU utilization, or input/output operations per second are found to be very high.
Or another startup using a serverless architecture with a NoSQL database like DynamoDB might need to scale its read capacity to deal with the surge in traffic.
We recommend using Amazon CloudWatch. Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. It provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, and optimize resource utilization. The service collects the data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications, and services. You can also monitor custom metrics generated by your own applications and services. Amazon CloudWatch lets you detect anomalous behavior in your environments, set alarms, and take automated actions to keep your applications running smoothly.
Let’s go back to our first example of using AWS Elastic Beanstalk. If the CPU utilization of EC2 was very high due to the surge in traffic, in a reactive mode, create an alarm to monitor that and send an email if it exceeds the thresholds you define. Since Amazon CloudWatch Alarm is integrated with Amazon Simple Notification Service (SNS), you can use any notification type supported by SNS.
Many customers even integrate with third party business communication aapplications or incident response platforms using AWS Lambda for remediation.
Next, take a proactive approach and configure the alarm to perform an automated action like executing an AWS Auto Scaling policy and adding new EC2 instances by scaling horizontally to distribute the traffic across multiple EC2 instances.
Defining a threshold for the alarms could be tricky. To make it easier for creating and avoiding manual configuration and experimentation of alarms, use the CloudWatch Anomaly Detection feature on metrics. Amazon CloudWatch Anomaly Detection applies machine learning algorithms to continuously analyze metrics of systems and applications, determine normal baselines, and surface anomalies with minimal user intervention. You can create alarms based on a metric's expected value.
These types of alarms don't have a static threshold for determining alarm state. Instead, they compare the metric's value to the expected value based on the anomaly detection model.
Finally, to monitor your resources in a single view, use Amazon CloudWatch Dashboards to create customized views of the critical resource and application measurements and alarms for your AWS resources. Through Automatic Dashboards, get aggregated views of the health and performance of all AWS resources. Automatic Dashboards are pre-built with AWS service recommended best practices to remain resource aware. They even dynamically update to reflect the latest state of important performance metrics. This enables you to quickly get started with monitoring, explore account and resource-based views of metrics and alarms, and easily drill-down to understand the root cause of performance issues.
Resource monitoring is a critical aspect of building and scaling your business, and it can help you achieve operational excellence. As your startup moves from a reactive monitoring strategy to a proactive one, use CloudWatch Metrics to monitor your resources, configure CloudWatch Alarms for generating alerts, and set up dashboards to get a unified view of important resources.
Have fun, and build on!