Monitoring Microservice-Based Cloud Applications Using Distributed Tracing

By Ran Ribenzaft, Co-Founder & CTO at Epsagon
By Trevor Hansen, Sr. Partner Solutions Architect at AWS

Applications running on the cloud are getting more and more complex. It is not uncommon, for example, to find systems that are defined as a set of microservices that cooperate not only between each other, but also with native cloud providers and solutions from ecosystem partners.

As application complexity increases, the debugging process in production environments gets more complicated as well.

Amazon Web Services (AWS) understands this challenge and includes tracing tools in its cloud services. For instance, AWS X-Ray helps developers analyze and debug distributed applications, such as those built using a microservices architecture.

Epsagon builds on AWS tools by providing automated end-to-end tracing across not distributed AWS services, as well as services outside of AWS. Epsagon is an AWS Partner Network (APN) Advanced Technology partner specializing in automated tracing for cloud microservices.

In this post, we’ll explain the importance of distributed tracing for microservice-based cloud applications, and walk you through an example of how to use the Epsagon solution.

Tools for Managing Microservices on AWS

A microservice architecture has proven to be an effective way of structuring cloud applications across a wide range of domains and scenarios.

Among their many advantages, microservices enable you to benefit from independent teams and heterogeneous ecosystems. Structuring your code into microservices also makes it easier to maintain and deploy the cloud services you develop, and better partner with those available within your ecosystems.

However, working with a greater number of smaller services instead of a single application presents its own management challenges.

AWS offers a rich set of tools to manage those microservices, including Amazon Elastic Container Service (Amazon ECS), AWS Fargate, AWS Lambda, Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon Elastic Compute Cloud (Amazon EC2).

Most of those tools are administered by AWS, which allows your team to focus on business logic instead of server maintenance and provisioning. For this reason, managed services make developers more agile. They free up your in-house engineering resources, and their use of containers supports reproducibility across different environments.

Managed services can’t be customized, however. To enable customization, AWS provides some un-managed services. Though they require you to handle provisioning and maintenance yourself, they do give you more freedom and the ability to customize.

Why Microservices Need Distributed Tracing

AWS microservice management tools provide some monitoring capabilities. For instance, any Lambda functions you use stream log entries to Amazon CloudWatch by default, so you can examine the workloads your functions run with some level of granularity.

Deployment metrics reveal the number of invocations, rate errors, and execution time for your applications running on Lambda functions.

When you are troubleshooting the cloud services you built from multiple microservices, whether yours or those of your ecosystem partners, it can be difficult to identify the source of a problem by studying log entries and deployment metrics alone.

An automated distributed tracing capability such as Epsagon can reveal the real source of a fault, performance slowdown, security breach, or other problem at any given moment.

What is Distributed Tracing?

Distributed tracing refers to the capability of getting visibility for local calls or functions, and being able to see the lifecycle of the full request. This means you can see what all of your requests have been spending time on, such as calling native services like Amazon DynamoDB or third-party services.

Distributed tracing is implemented through instrumentation. In other words, it uses the software development kit (SDK) of the selected provider to add code for teaching the application how to stream events to the logs. In this way, the provider can correlate them and make your system a white box for debugging.

What is Automated Tracing?

Automated tracing happens when you get observability on your system without changing the actual code of the application. This means you will see how your requests are behaving in an end-to-end fashion without any code updates.

Tracing with AWS X-Ray

You can use AWS X-Ray if you deploy your application on one of the services it supports: Amazon EC2, Amazon ECS, AWS Lambda, or AWS Elastic Beanstalk.

X-Ray allows you to trace requests from beginning to end across all touch points of your distributed system. In other words, X-Ray gives you insights about your application’s performance by obtaining an end-to-end view of your system in a visual map of your application’s architecture.

With X-Ray, you can localize issues easier and thus get your system back to normal faster.

Tracing with Epsagon

Epsagon is fully-managed software-as-a-service (SaaS) that includes tracing for all AWS services, third-party APIs (via HTTP calls), and other common services such as Redis, Kafka, and Elastic.

The Epsagon service includes monitoring capabilities, alerting to most common services, and payload visibility into each and every call your code is making.

Example: Using Epsagon Distributed Tracing

This procedure relies on a small retail web application we have hosted on GitHub. We will use that application to demonstrate how you can use Epsagon to run end-to-end distributed tracing on any distributed application.

Components

To demonstrate Epsagon’s support for different workloads, the application is deployed as a set of components that run on AWS Fargate.

Figure 1 – Architecture of the demo retail application.

Our demo application has three main components:

1. Initial HTTP endpoint on AWS Fargate

The initial endpoint retrieves the initial order. It is a Python Flask application that has the endpoint/order and waits for an order payload. Once it is deployed, you can hit the endpoint with the following payload:

POST http://<public_ip_provided_by_fargate>:8000/order 

{
 "order": "1",
 "product": "Echo Dot (2nd Generation)",
 "amount": 400
}

2. Amazon Simple Notification Service (Amazon SNS) topic

This Amazon SNS topic (arn:aws:sns:<your_region>:<your_owner>:orders) will be used by the initial HTTP endpoint so that it can send a message for further processing of orders.

This approach creates an asynchronous chain of events in the application so that we can respond quickly to a customer and decouple services.

3. Callback HTTP endpoint on AWS Fargate

The callback endpoint is a Python Flask application that has the endpoint /process_order and waits for an order payload.

Once a message is published on the SNS topic, the Amazon SNS system will do a POST call to the endpoint with the following payload:

POST http://<public_ip_provided_by_fargate>:8000/process_order 
{
      ...
"Message": {
 		"order": "1",
 		"product": "Echo Dot (2nd Generation)",
 		"amount": 400
}
...
}

To keep our demo application simple, the callback endpoint does not implement a stock validation, but be aware that on a production system it must.

The callback endpoint also creates a payment intent with the Stripe API to simulate a payment transaction.

Deploying the Demo Application

To deploy the application, follow the instructions in each repository’s README file. If it’s your first time deploying an AWS Fargate application, this blog post by Epsagon will be helpful.

Finally, to create the Amazon SNS topic and subscriptions, follow the instructions in the Fargate documentation.

Enabling Automated Tracing

Once the demo application is running on Fargate and the Amazon SNS topic is in place, set up your free Epsagon account.

Once you hit the initial HTTP endpoint, you’ll be able to follow a full trace from end to end.

Figure 2 – Display the full trace within Epsagon, including resource-specific data.

Another interesting view provided by Epsagon is the timeline chart, which helps you identify performance issues and bottlenecks in your distributed application.

Figure 3 – Displaying a detailed timeline breakdown to identify performance problems.

In addition, Epsagon monitors the state of all clusters and services in your account, providing an overview of their health status, and drilling down to the performance metrics of each and every task.

Figure 4 – Dashboard showing a high-level overview of services and tasks running on AWS.

Conclusion

Applications built with a distributed microservices architecture require automated, distributed tracing tools that provide end-to-end observability within the AWS Cloud and outside of it.

Full traceability should no longer be considered a “nice to have” mechanism. Teams should be able to obtain insights about anomalies faster, so they can react smarter by eliminating the guesswork when searching for errors in production. You can only achieve this by making your system a white box.

With Epsagon’s distributed tracing, you can achieve observability with no agents. Moreover, Epsagon fully supports automated tracing on AWS Lambda, and with little code you can achieve full traceability on AWS Fargate. There is no need for heavy lifting, maintenance, or training to trace applications.

.

.

Epsagon – APN Partner Spotlight

Epsagon is an AWS Competency Partner specializing in automated tracing for cloud microservices. Its solution builds on AWS tools by providing automated end-to-end tracing across distributed AWS services, and services outside of AWS.

Contact Epsagon | Solution Overview | AWS Marketplace

*Already worked with Epsagon? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.