Containers
Analyze Kubernetes container logs using Amazon S3 and Amazon Athena
Logs are crucial when understanding any system’s behavior and performance. For postmortem analysis of software, along with traces and metrics, logs can be the closest thing to having a time machine. A dilemma many developers have traditionally faced is: what to log and what not to? This predicament has led to too many logs or worse, not enough. Historically, high storage costs have forced developers to reduce the level of detail being captured in application logs. But cloud computing has reduced the costs of storage significantly. Services like Amazon S3 offer customers a cost-efficient and durable storage for virtually unlimited amounts of data, data that can then be analyzed as-is, at scale using Amazon Athena and Redshift Spectrum.
We will demonstrate how you can capture Kubernetes application logs using Fluent Bit, store them in Amazon S3, and analyze them using Amazon Athena. At the crux of the solution is Fluent Bit, an open source log processor and forwarder that allows you to collect logs from different sources, and unify and send them to multiple destinations. Fluent Bit plugins support various AWS and partner monitoring solutions, including Amazon CloudWatch, Amazon Kinesis, Datadog, Splunk, and Amazon S3.
For log analysis, we use Amazon Athena, an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to set up, manage, or pay for. You are charged for the amount of data scanned by each query you run. You have the ability to analyze hundreds of terabytes of data without any upfront or recurring infrastructure costs.
Architecture
The reference architecture we propose in this post uses Fluent Bit to collect container logs produced by a sample Python application running in an Amazon EKS cluster. Fluent Bit runs as a DaemonSet and ships logs to an S3 bucket for permanent retention. Once the logs are available in Amazon S3, we use Amazon Athena to analyze.
You will need the following to complete the tutorial:
- AWS CLI version 2
- eksctl
- kubectl
- Docker
- an EKS cluster. See creating an Amazon EKS cluster if you don’t have one.
- an S3 bucket
Let’s start by setting a few environment variables:
You can use the AWS CLI to find out the name of your EKS cluster by listing EKS clusters in your AWS Region:
Deploy the sample application
The post provides a mock e-commerce ordering application that generates dummy logs that contain sales records in JSON-encoded format. To use the sample app, you can create a Docker image and push it to an ECR repository in your account.
Create a Python script by running the command:
Create DockerFile:
Build the image:
Create an ECR repository and push the image:
Fluent Bit IAM role configuration
In this demo, we want to analyze logs produced by the sample application. Suppose we are interested in analyzing the log entries for sales in California. We can use Fluent Bit to filter log records with CA in the state field and send them to an S3 bucket, while the rest of the logs go to CloudWatch Logs.
We will get into how we filter logs using Fluent Bit shortly. First, the Fluent Bit pods need an IAM role to be able to write logs to the S3 bucket and CloudWatch Logs. We have to create and associate an OIDC provider with the EKS cluster so pods can assume IAM roles. eksctl can automate this with a single command:
Now, create a Kubernetes service account in the cluster. This service account has an associated IAM role with permissions to write to S3 buckets and CloudWatch Logs. In production, you should create a fine-grained IAM policy that only permits writes to a specific S3 bucket.
Deploy Fluent Bit
Create the required ClusterRole and ClusterRoleBinding for Fluent Bit:
Fluent Bit stores its configuration in a Kubernetes ConfigMap. We need to create a Fluent Bit ConfigMap to include log input and output details. The [INPUT] section is the local filesystem directory that stores container logs, which is /var/log/containers/*.log
in Kubernetes. The [OUTPUT] section defines the destination where Fluent Bit transmits container logs for retention. In the current scenario, the outputs will be S3 and CloudWatch Logs.
Fluent Bit supports multiple input and output streams. Using tags, you can route input streams to various output destinations instead of storing different kinds of logs into one destination. As an example, the Fluent Bit config map below has one input and two outputs. The input matches any log file in var/log/containers/.
We use Fluent Bit Stream Processing to inspect each log entry, and if it matches our criteria (whether state equals CA or not), it is sent to one of the two destinations.
The sample application generates fake sales records and logs them in this format:
We are interested in analyzing log entries where the state key has ‘CA’ as its value. We create two Fluent Bit Stream Processors (called STREAM_TASK in the Fluent Bit ConfigMap): the first processor looks for state = ‘CA’
and sends matching records to the S3 bucket. The second processor looks for state != ‘CA’
, and sends matching records to CloudWatch Logs. If you want to send all records to CloudWatch irrespective of the content, you can configure the output to match the input’s tag like this:
You can customize these rules to fit your scenario. For example, you can send DEBUG level logs to S3, while others to CloudWatch as explained in splitting an application’s logs into multiple streams: a Fluent tutorial.
Create a config map for Fluent Bit:
In a real-world use case, you can have many inputs and outputs. For example, you can send low priority raw logs to an S3 bucket and send other logs to Amazon CloudWatch, or any other Fluent Bit supported destination.
The Fluent Bit S3 output plugin buffers data locally in its store_dir, which we have set to a directory on the node’s filesystem. We do this so that data will still be sent even if the Fluent Bit pod suddenly stops and restarts. We’ve set maximum file size and a timeout so that each uploaded file is never more than 30 MB, and data is uploaded at least once every 3 minutes (even if less than 30 MB have been received). Fluent Bit uses multipart uploads to send larger files in chunks; hence, only a minimal amount of data is buffered at any point in time.
The next step is to create the Fluent Bit DaemonSet, which runs a pod on each node in the Kubernetes cluster; this pod monitors the node’s filesystem for logs and buffers them to the destination.
We need to find out the image repository and version to create the Fluent Bit DaemonSet. We can use AWS Systems Manager to get this information:
This command requires AWS CLI version 2. If you’re using AWS CLI version 1 and the command above doesn’t work, you can find out the image version and repository by following the instructions at AWS for Fluent Bit GitHub repository.
Create the Fluent Bit DaemonSet:
Verify that Fluent Bit Pods are running:
Generate logs
Now that the logging infrastructure is operational, it’s time to test it by generating logs. Execute the manifest below, and it will create a deployment with three pods from the image you pushed to your ECR repository earlier.
Verify that the sample application’s pods are running:
Once the sample application pods are running, you can check Fluent Bit logs to verify that logs are being pushed to S3 successfully.
The output should look like this:
Query logs using Athena
Fluent Bit is sending the logs that the sample application creates to the S3 bucket. Below, you will see the folder structure of the S3 bucket. Fluent Bit stores logs in Hive format and partitioned by date and time.
S3 bucket contents
Amazon Athena allows you to query data in S3 without setting up or maintaining any infrastructure. With Athena, you can:
- Query data using ANSI SQL. You don’t need to learn a new query language.
- Perform complex analysis including large joins, window functions, and arrays.
- Cost-optimize storage. You can store data in S3 rather than a costly database.
To analyze logs stored in S3, we now need to navigate to the Amazon Athena console and create a table. But before that, let’s take a look at what happens to log entries as they go through different systems.
The sample application logs transaction details in JSON format to the standard output (stdout
):
When the container runtime saves those logs to the local filesystem, it adds metadata to the application’s log entries, and the transformed log entry looks like this:
Then, Fluent Bit adds its metadata to each log entry, so the same log entry from above looks like this in the file saved on the S3 bucket.
To analyze the logs stored in the files in the S3 bucket, we need to parse the log entries and convert fields into rows and columns. Athena uses SerDe (Serializer/Deserializer) to interact with data in different formats (see Athena documentation includes a list of supported SerDes). Since the log entries are JSON-encoded, we can use the OpenX JSON SerDe.
Open Amazon Athena in the AWS Management Console and create an Athena table using DDL:
Ensure that you enter the name of your S3 bucket in the LOCATION section
The command above creates a table called eks_fb_s3
. You can see a sample of the data in eks_fb_s3
table by running the following query:
Notice that the table contains records for where state=CA. Meanwhile, the logs entries for other states are sent to CloudWatch. Head back to AWS CLI and run the command below to see application logs in CloudWatch:
The result shouldn’t contain any records for sales in California.
Cleanup
Use the following commands to delete resources created during this post:
Fluent Bit support for Amazon Kinesis Data Firehose
Many customers use Fluent Bit’s support for Amazon Kinesis Data Firehose to stream logs to Amazon S3. Using Firehose to deliver data to S3 can be more reliable since data is transmitted to Firehose much quickly compared to Fluent Bit’s integration with S3. It is because Firehose acts as a distributed buffer and manages retries. Fluent Bit has to handle the buffering and retrying in the absence of Firehose in the middle, which isn’t by itself a bad thing, but if Fluent Bit (or any underlying component like the node, cluster, etc.) fails, any un-transmitted logs could be lost. You can improve the Fluent Bit’s reliability by using a persistent volume, as explained here, which makes Fluent Bit look for any previously un-transmitted data upon restart. It’s possible to lose logs if the containers logs are rotated before the Fluent Bit pod restarts and is ready to transmit them.
Be aware of the quotas when using Amazon Kinesis Data Firehose. You may have to request a limit increase if your applications generate large volumes of logs.
Conclusion
Amazon S3 provides cost-effective and extensible storage, which allows you to collect and analyze data using Amazon Athena without incurring high storage and infrastructure costs. You can use Fluent Bit’s S3 plugin to aggregate and transmit logs to Amazon S3, and many other destinations. Fluent Bit’s S3 plugin is designed to handle data at volume, and it optimizes data transfer to S3 using the multipart upload API.
You can learn more about the upcoming features for Fluent Bit’s S3 output plugin on Fluent Bit’s GitHub repository.
Further reading
It’s helpful to understand how container logs are stored on a Kubernetes worker node’s filesystem. Kubernetes configures the container runtime to store logs in JSON format on the node’s local filesystem. In EKS, the container runtime stores container logs at /var/lib/docker/containers/{Container ID}/{UID}/{UID-json.log}
. Kubernetes also creates a symlink for log files in /var/log/pods
and /var/log/containers
.
The naming format for log files differs in each directory. In /var/log/pods
, log files naming follows this scheme: {Kubernetes namespace}_{Pod name}_{Pod UID}
. In /var/log/containers
, log files naming follows a different scheme: {Pod name}_{Kubernetes namespace}_{Container name}_{Container ID}
.
Notice the contents of /var/log/pods
and /var/log/containers
on an EKS worker node.
As you can see, the permissions vary in each directory. While any process can read files in /var/log/containers
, the permissions in /var/log/pods
are more restrictive. /var/log/containers
is the preferred source for container logs in Kubernetes.
In Fluent Bit, you can also create inputs that match the log files of a particular pod or deployment. For example, if pods are named “ordering-app”, you can create a Fluent Bit input that monitors all files at /var/log/containers/ordering-app*.log
. This is helpful when running applications that produce logs in different formats; you can create multiple Fluent Bit inputs, process them, and store them accordingly.
And finally, here are some links that we found useful: