Containers
Access Logging Made Easy with AWS App Mesh and Fluent Bit
NOTICE: October 04, 2024 – This post no longer reflects the best guidance for configuring a service mesh with Amazon ECS and Amazon EKS, and its examples no longer work as shown. For workloads running on Amazon ECS, please refer to newer content on Amazon ECS Service Connect, and for workloads running on Amazon EKS, please refer to Amazon VPC Lattice.
——–
I’ve found that the term microservices can have different meanings and benefits depending on who you talk to. However, the one benefit where I’ve typically found consensus is that microservices allow your teams to have the freedom to choose the best tool for each job. Meaning, microservices architectures shouldn’t follow a “one size fits all” approach.
While this approach enables your teams to work independently, you can run into some challenges as your microservices architecture grows. One of the challenges of a polyglot microservices architecture is trying to correlate different access logs into a consistent format as they are sent to a centralized logging solution. Imagine trying to find a particular error or status code across different services that are interacting with each other with no data consistency in your logs. Moreover, imagine trying to maintain all the different parsers you need to ingest that data into a logging solution. You don’t want to waste your cycles here as it takes away from the innovation and quicker time-to-market that microservices architectures are supposed to bring to your organization.
The aforementioned challenge is just one reason why AWS App Mesh is great for unifying your services behind a service mesh. App Mesh allows you to gain visibility between the various services in your environment, all while making it easy to monitor, control and debug the communications between those services. App Mesh leverages Envoy as a proxy in front of your containers in the service mesh which allows you to generate access logs in a consistent format. In this blog post, you will learn how to implement a consistent and structured log format for your microservices applications with AWS App Mesh and Fluent Bit.
Understanding Envoy and the Envoy Access Logs
Before we begin, let’s cover some basics around Envoy and App Mesh. As mentioned earlier, App Mesh uses the open source Envoy proxy, making it compatible with a wide range of AWS Partner Network (APN) technology partners and open source tools. This provides you with consistent visibility and network traffic controls for services built across multiple types of compute infrastructure.
When it comes to access logs and the format of those logs, the Envoy proxy uses format strings when generating access logs, which are plain strings that include the details of a HTTP request. Below is the format Envoy uses for the access logs:
Here is an example of the default Envoy access log format:
As you can see, there is a wealth of information here. In order for it to be easily searchable in a downstream log collection system, we need to structure these messages into JSON. Now that you understand the basics, let’s dive into two different examples.
FireLens Example: Parse Envoy Access Logs from AWS App Mesh
This example assumes you have some level of familiarity with AWS App Mesh, Amazon ECS on AWS Fargate, and FireLens for Amazon ECS. In order to demonstrate a microservices application running in a service mesh, we will leverage the Color App as our example application.
Once you’ve created your environment, you need to turn on access logging for Envoy in App Mesh which is actually very simple. If you aren’t familiar with virtual nodes in App Mesh, they are a logical pointer to a discoverable service such as an ECS or Kubernetes service. When you create your virtual nodes, you have the option to configure the path for Envoy’s access logs.
Here is an example from the console where we are logging to /dev/stdout, which is recommended so you can configure FireLens to send to a destination like Amazon CloudWatch Logs:
Once you’ve done this for one of the virtual nodes like colorteller-black, you see something like this in your access logs:
{
"log": "[2020-01-23T16:32:40.781Z] \"GET / HTTP/1.1\" 200 - 0 5 0 0 \"-\" \"Go-http-client/1.1\" \"0ed75cb8-a563-9ca3-8ff0-2d8eab307e3e\" \"colorteller.appmesh-demo:9080\" \"127.0.0.1:9080\"\n",
"stream": "stdout",
"time": "2020-01-23T16:32:49.400311038Z"
}
As you can see, the data in the log message is Envoy’s default access log, but the JSON is escaped. In order to parse the log message into something more meaningful, we will need to write our own parser for Envoy. At the time of this blog, neither the default amazon/aws-for-fluent-bit image nor the fluent/fluent-bit image contains a parser for Envoy’s access logs but a pull request to add this parser has been submitted with the official Fluent Bit project. With that said, the amazon/aws-for-fluent-bit image does contain a number of parsers files under /fluent-bit/parsers for you to use as these parsers are copied directly from the official Fluent Bit Docker image. To see what parsers are included by default, please see the Fluent Bit Github repository.
As you can see, we’ve written a regex to match the default Envoy format we highlighted earlier. Now we need to build the custom Docker image using this Dockerfile:
Once you have the Docker image built, you need to push it to Amazon Elastic Container Registry (ECR). After you’ve uploaded it to ECR, you would reference your custom Fluent Bit image in your task definitions and add the FireLens specific values. The Color App example creates multiple task definitions for each color in the mesh and below is an example task definition that wires up your custom Fluent Bit image with Envoy:
appmesh-firelens-colorteller-black-ecs-task-def.json
Here is an overview of the new settings:
- We’ve added a new container called log_router, which references our custom Fluent Bit image in ECR and told it to use the /fluent-bit/conf/parse_envoy.conf configuration file to parse Envoy’s access logs.
"image": "012345678910.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit-custom-envoy:latest",
"name": "log_router",
"firelensConfiguration": {
"type": "fluentbit",
"options": {
"enable-ecs-log-metadata": "true",
"config-file-type": "file",
"config-file-value": "/fluent-bit/conf/parse_envoy.conf"
}
}
- We’ve added a new Environment Variable to the Envoy container with: {“name”:”ENVOY_LOG_LEVEL”,”value”:”info”} which is an optional value that allows you to specify the log level for the Envoy container that you can read more about here. If you want, you can choose to export only the Envoy access logs (and ignore the other Envoy container logs) by setting the ENVOY_LOG_LEVEL to off. With that said, I wouldn’t recommend you turn these logs off in your Production environments because you might need it in order to diagnose any issues with Envoy itself.
"environment": [{
"name": "APPMESH_VIRTUAL_NODE_NAME",
"value": "mesh/color-mesh/virtualNode/colorteller-black-appmesh-demo"
},
{
"name": "ENVOY_LOG_LEVEL",
"value": "info"
}
],
- We’ve changed the Envoy container’s logDriver to use awsfirelens to wire up our Fluent Bit container to push logs to CloudWatch.
"logConfiguration": {
"logDriver": "awsfirelens",
"options": {
"Name": "cloudwatch",
"region": "us-east-1",
"log_group_name": "appmesh-firelens",
"auto_create_group": "true",
"log_stream_prefix": "envoy-black-"
}
}
- From this point, it’s as simple as registering the task definition you’ve updated and updating the service to use the latest version of the task definition. You can then navigate to your CloudWatch log group to view the parsed Envoy logs in JSON.
{
"authority": "colorteller.appmesh-demo:9080",
"bytes_received": "0",
"bytes_sent": "6",
"code": "200",
"container_id": "32561e17b9b943cc6a07d8db68d2d0c921fe0e9daafa9e4c7d402fc36eaf3196",
"container_name": "/ecs-appmesh-firelens-6-envoy-d4b2bcf39bd698b9a101",
"duration": "0",
"ecs_cluster": "arn:aws:ecs:us-east-1:012345678910:cluster/appmesh-firelens",
"ecs_task_arn": "arn:aws:ecs:us-east-1:012345678910:task/b69367b1-d558-4116-9b9f-18dfcae657d1",
"ecs_task_definition": "appmesh-firelens-colorteller-black:6",
"method": "GET",
"path": "/",
"protocol": "HTTP/1.1",
"request_id": "3a1957c3-3d47-9259-bdc6-f88ebc4b3da7",
"response_flags": "-",
"source": "stdout",
"start_time": "2020-02-03T19:03:22.305Z",
"upstream_host": "127.0.0.1:9080",
"user_agent": "Go-http-client/1.1",
"x_envoy_upstream_service_time": "0",
"x_forwarded_for": "-"
}
Fluent Bit Example: Parse Envoy Access Logs from AWS App Mesh on Amazon EKS
This example assumes you have some level of familiarity with AWS App Mesh, Amazon EKS, and Fluent Bit. In order to demonstrate a microservices application running in a service mesh, we will once again leverage the Color App. If you would like to test this out yourself on your EKS cluster, be sure to follow the documentation.
When it comes to getting the Envoy logs out of your applications running in EKS, it’s essentially the same process as the FireLens example above. However, before we get into the Envoy logs, it’s important to note that Kubernetes does not provide a native storage solution for log data. However, you can integrate many existing logging solutions into your Kubernetes cluster. With that said, a common way to ingest, parse, and forward your logs on EKS is to implement a Fluent Bit DaemonSet on your EKS worker nodes as shown below:
As you can see in the below example, Fluent Bit reads the various different logs emitted from your applications via an input. All of this is controlled by a ConfigMap, which gives you a powerful way to decouple configuration artifacts from image content to keep containerized applications portable. The full ConfigMap used for this example can be found here:
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
In order for Fluent Bit to understand the Envoy logs coming from App Mesh, it’s simply a matter of implementing our Envoy regex as a parser in the ConfigMap.
[PARSER]
Name envoy
Format regex
Regex ^\[(?<start_time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)? (?<protocol>\S+)" (?<code>[^ ]*) (?<response_flags>[^ ]*) (?<bytes_received>[^ ]*) (?<bytes_sent>[^ ]*) (?<duration>[^ ]*) (?<x_envoy_upstream_service_time>[^ ]*) "(?<x_forwarded_for>[^ ]*)" "(?<user_agent>[^\"]*)" "(?<request_id>[^\"]*)" "(?<authority>[^ ]*)" "(?<upstream_host>[^ ]*)"
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
Time_Keep On
Time_Key start_time
If you look at the example color.yaml, you will notice it’s the same manifest that’s used in the Color App from the AWS App Mesh documentation. However, it’s been slightly modified with the following:
- Configure the colorteller-black VirtualNode in App Mesh to emit the Envoy access logs to /dev/stdout
apiVersion: appmesh.k8s.aws/v1beta1
kind: VirtualNode
metadata:
name: colorteller-black
namespace: appmesh-demo
spec:
meshName: color-mesh
listeners:
- portMapping:
port: 9080
protocol: http
serviceDiscovery:
dns:
hostName: colorteller-black.appmesh-demo.svc.cluster.local
logging:
accessLog:
file:
path: /dev/stdout
- It also adds an annotation of fluentbit.io/parser: envoy to the colorteller-black deployment.
template:
metadata:
labels:
app: colorteller
version: black
annotations:
fluentbit.io/parser: envoy
This annotation suggests that the data should be processed using the pre-defined parser called envoy which we defined in the fluent-bit-configmap.yaml. I think is one of the most powerful features of the Fluent Bit parser. This is because parser configuration can be suggested per pod using annotations instead of centralized in the config map. This is how we will wire up Fluent Bit to parse the Envoy access logs for App Mesh. Please see this link for more info on pre-defined parsers in Fluent Bit.
From this point on, all of your colorteller-black Envoy access logs should look like the below example in CloudWatch. You will see that the logs are now structured in a key called log_processed which is also defined in the config map. You can read more about this in the Kubernetes filter for Fluent Bit documentation.
{
"kubernetes": {
"annotations": {
"fluentbit.io/parser": "envoy",
"kubernetes.io/psp": "eks.privileged"
},
"container_hash": "b10687cb4b94ef7aecc0c6e815efb56c8d8889db5316bafc42477acd908a0e91",
"container_name": "envoy",
"docker_id": "3be97d47d717ae3ba9937d1bd58b683cdb8b24bd5c66cf7fefeb2ee47c808b08",
"host": "ip-192-168-10-112.ec2.internal",
"labels": {
"app": "colorteller",
"pod-template-hash": "d868b5bc9",
"version": "black"
},
"namespace_name": "appmesh-demo",
"pod_id": "42336070-4796-11ea-ac15-0278ee4c2031",
"pod_name": "colorteller-black-d868b5bc9-zwz28"
},
"log": "[2020-02-05T16:37:27.958Z] \"GET / HTTP/1.1\" 200 - 0 5 0 0 \"-\" \"Go-http-client/1.1\" \"6c6ae4b8-60ac-98b6-ba32-a3a4a5c938bf\" \"colorteller.appmesh-demo:9080\" \"127.0.0.1:9080\"\n",
"log_processed": {
"authority": "colorteller.appmesh-demo:9080",
"bytes_received": "0",
"bytes_sent": "5",
"code": "200",
"duration": "0",
"method": "GET",
"path": "/",
"protocol": "HTTP/1.1",
"request_id": "6c6ae4b8-60ac-98b6-ba32-a3a4a5c938bf",
"response_flags": "-",
"start_time": "2020-02-05T16:37:27.958Z",
"upstream_host": "127.0.0.1:9080",
"user_agent": "Go-http-client/1.1",
"x_envoy_upstream_service_time": "0",
"x_forwarded_for": "-"
},
"stream": "stdout",
"time": "2020-02-05T16:37:33.976469719Z"
}
Conclusion
In this post, I showed you how easy it is to implement a consistent and structured log format for the access logs of your microservices applications with AWS App Mesh and Fluent Bit. As you can see, it doesn’t matter which container orchestrator or even which language your teams choose. Leveraging the out of the box functionality provided by App Mesh and Envoy’s access logs gives you the foundation to implement a consistent logging structure in your environment.
If you’d like to learn more about FireLens, take a look at this great webinar by Wesley Pettit and also this deep dive blog by Wesley that will teach you how to split an application’s log output into multiple streams.
We are excited to hear about your use cases so please open up an issue on the AWS containers roadmap or the AWS App Mesh roadmap on GitHub if there is anything you would like to see.