How do I analyze my Application Load Balancer access logs using Amazon Athena?

Last updated: 2022-08-25

I want to analyze my Application Load Balancer access logs with Amazon Athena.

Short description

Elastic Load Balancing doesn't activate access logging by default. When you activate access logging, you specify an Amazon Simple Storage Service (Amazon S3) bucket. All Application Load Balancer and Classic Load Balancer access logs are stored in that Amazon S3 bucket. When you want to troubleshoot or analyze the performance of your load balancer, you can use Athena to analyze the access logs in S3.

Note: Although you can use Athena to analyze access logs for both Application Load Balancers and Classic Load Balancers, only Application Load Balancers are covered in this article.

Resolution

Create a database and table for Application Load Balancer logs

To analyze access logs in Athena, create a database and table by doing the following:

1.    Open the Athena console.

2.    In the Query Editor, run a command similar to the following to create a database. It's a best practice to create the database in the same AWS Region as the S3 bucket.

create database alb_db

3.    In the database that you created in previous step, create a table alb_log for the Application Load Balancer logs. For more information, see Creating the table for Application Load Balancer logs.

Note: For better query performance, you can choose to create a table with partition projection. In partition projection, partition values and locations are calculated from configuration rather than read from a repository, such as the AWS Glue Data Catalog. For more information, see Partition projection with Amazon Athena.

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs (
            type string,
            time string,
            elb string,
            client_ip string,
            client_port int,
            target_ip string,
            target_port int,
            request_processing_time double,
            target_processing_time double,
            response_processing_time double,
            elb_status_code int,
            target_status_code string,
            received_bytes bigint,
            sent_bytes bigint,
            request_verb string,
            request_url string,
            request_proto string,
            user_agent string,
            ssl_cipher string,
            ssl_protocol string,
            target_group_arn string,
            trace_id string,
            domain_name string,
            chosen_cert_arn string,
            matched_rule_priority string,
            request_creation_time string,
            actions_executed string,
            redirect_url string,
            lambda_error_reason string,
            target_port_list string,
            target_status_code_list string,
            classification string,
            classification_reason string
            )
            PARTITIONED BY
            (
             day STRING
            )
            ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
            WITH SERDEPROPERTIES (
            'serialization.format' = '1',
            'input.regex' = 
            '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"')
            LOCATION 's3://your-alb-logs-directory/AWSLogs/1111222233334444/elasticloadbalancing/<REGION>/'
            TBLPROPERTIES
            (
             "projection.enabled" = "true",
             "projection.day.type" = "date",
             "projection.day.range" = "2022/01/01,NOW",
             "projection.day.format" = "yyyy/MM/dd",
             "projection.day.interval" = "1",
             "projection.day.interval.unit" = "DAYS",
             "storage.location.template" = "s3://your-alb-logs-directory/AWSLogs/1111222233334444/elasticloadbalancing/<REGION>/${day}"
            )

Be sure to replace the table name and S3 locations according to your use case.

Or, you can create a table with partitions using the following query and load the partitions using the ALTER TABLE ADD PARTITION command.

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs_partitioned (
type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code string,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
lambda_error_reason string,
target_port_list string,
target_status_code_list string,
classification string,
classification_reason string
)
PARTITIONED BY(day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' =
'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"')
LOCATION 's3://my_log_bucket/AWSLogs/1111222233334444/elasticloadbalancing/us-east-1/'
ALTER TABLE alb_logs_partitioned ADD
  PARTITION (day = '2022/05/21')
  LOCATION's3://my_log_bucket/AWSLogs/1111222233334444/elasticloadbalancing/us-east-1/2022/05/21/'

Note: It's not a best practice to use an AWS Glue crawler on the Application Load Balancer logs.

4.    Under Tables in the navigation pane, choose Preview table from the menu button that's next to the table name. The data from the Application Load Balancer access logs is visible in the Results window.

5.    Use the Query editor to run SQL statements on the table. You can save queries, view previous queries, or download query results in CSV format.

Example queries

In the following examples, be sure to modify the table name, column values, and other variables to fit your query.

Action Query

View the first 100 access log entries in chronological order.

Use case: analysis and troubleshooting

SELECT *
FROM alb_log
ORDER by time ASC
LIMIT 100;

List all client IP addresses that accessed the Application Load Balancer, and how many times they accessed the Application Load Balancer.

Use case: analysis and troubleshooting

SELECT distinct client_ip, count() as count from alb_log
GROUP by client_ip
ORDER by count() DESC;

List the average amount of data (in kilobytes) that is passing through the Application Load Balancer in request/response pairs.

Use case: analysis and troubleshooting

SELECT (avg(sent_bytes)/1000.0 + avg(received_bytes)/1000.0)
as prewarm_kilobytes from alb_log;

List all targets that the Application Load Balancer is routing traffic to and how many times the Application Load Balancer has routed requests to each target, by percentage distribution.

Use case: identify potential target traffic imbalances

SELECT target_ip, (Count(target_ip)* 100.0 / (Select Count(*) From alb_log))
as backend_traffic_percentage
FROM alb_log
GROUP by target_ip
ORDER By count() DESC;

List the times that a client sent a request to the Application Load Balancer and then closed the connection to the Application Load Balancer before the idle timeout elapsed (HTTP 460 error).

Use case: troubleshoot HTTP 460 errors

SELECT * from alb_log where elb_status_code = '460';

List the times that a client request wasn't routed because the listener rule forwarded the request to an empty target group (HTTP 503 error).

Use case: troubleshoot HTTP 503 errors

SELECT * from alb_log where elb_status_code = '503';

List clients in descending order, by the number of times that each client visited a specified URL.

Use case: analyze traffic patterns

SELECT client_ip, elb, request_url, count(*) as count from alb_log
GROUP by client_ip, elb, request_url
ORDER by count DESC;

List the 10 URLs that Firefox users accessed most frequently, in descending order.

Use case: analyze traffic distribution and patterns

SELECT request_url, user_agent, count(*) as count
FROM alb_log
WHERE user_agent LIKE '%Firefox%'
GROUP by request_url, user_agent
ORDER by count(*) DESC
LIMIT 10;

List clients in descending order, by the amount of data (in megabytes) that each client sent in their requests to the Application Load Balancer.

Use case: analyze traffic distribution and patterns

SELECT client_ip, sum(received_bytes/1000000.0) as client_datareceived_megabytes
FROM alb_log
GROUP by client_ip
ORDER by client_datareceived_megabytes DESC;

List each time in a specified date range when the target processing time was more than 5 seconds.

Use case: troubleshoot latency in a specified time frame

SELECT * from alb_log
WHERE (parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z')
    BETWEEN parse_datetime('2018-08-08-00:00:00','yyyy-MM-dd-HH:mm:ss')
    AND parse_datetime('2018-08-08-02:00:00','yyyy-MM-dd-HH:mm:ss'))
AND (target_processing_time >= 5.0);

Count the number of HTTP GET requests received by the load balancer grouped by the client IP address.

Use case: analyze incoming traffic distribution

SELECT COUNT(request_verb) AS count, request_verb, client_ip FROM alb_log_partition_projection WHERE day = '2022/05/21' GROUP BY request_verb, client_ip;  

Did this article help?


Do you need billing or technical support?