How can I analyze custom VPC Flow Logs using CloudWatch Logs Insights?

Last updated: 2022-04-27

I have configured custom VPC Flow Logs. How can I discover patterns and trends with Amazon CloudWatch Logs Insights?

Short description

You can use CloudWatch Logs Insights to analyze VPC Flow Logs. CloudWatch Log Insights automatically discovers fields in many Amazon provided logs, as well as JSON formatted log events, to allow for easy query construction and log exploration. VPC Flow Logs that are in the default format are automatically discovered by CloudWatch Logs Insights.

But, VPC Flow Logs are deployed in a custom format. Because of this, they aren't automatically discovered, so you must modify the queries. This article gives several examples of queries that you can customize and extend to match your use cases.

This custom VPC Flow Logs format is used:

${account-id} ${vpc-id} ${subnet-id} ${interface-id} ${instance-id} ${srcaddr} ${srcport} ${dstaddr} ${dstport} ${protocol} ${packets} ${bytes} ${action} ${log-status} ${start} ${end} ${flow-direction} ${traffic-path} ${tcp-flags} ${pkt-srcaddr} ${pkt-src-aws-service} ${pkt-dstaddr} ${pkt-dst-aws-service} ${region} ${az-id} ${sublocation-type} ${sublocation-id}

Resolution

Retrieve latest VPC Flow Logs

Because log fields are not automatically discovered by CloudWatch Logs Insights, you must use the parse keyword to isolate desired fields. In this query, the results are sorted by the flow log event start time, and restricted to the two most recent log entries.

Query

#Retrieve latest custom VPC Flow Logs
parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| sort start desc
| limit 2

Results


account_id  vpc_id   subnet_id  interface_id instance_id srcaddr srcport
123456789012  vpc-0b69ce8d04278ddd  subnet-002bdfe1767d0ddb0 eni-0435cbb62960f230e 172.31.0.104 55125
123456789012  vpc-0b69ce8d04278ddd1  subnet-002bdfe1767d0ddb0 eni-0435cbb62960f230e 91.240.118.81 49422

Summarize data transfers by source/destination IP address pairs

Next, summarize the network traffic by source/destination IP address pairs. In this example, the sum statistic is used to perform an aggregation on the bytes field. This calculates a cumulative total of the data transferred between hosts. For more context, the flow_direction is included. The results of this aggregation are then assigned to the Data_Transferred field, temporarily. Then, the results are sorted by Data_Transferred in descending order, and the two largest pairs are returned.

Query

parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| stats sum(bytes) as Data_Transferred by srcaddr, dstaddr, flow_direction
| sort by Data_Transferred desc
| limit 2

Results

srcaddr dstaddr flow_direction Data_Transferred
172.31.1.247 3.230.172.154 egress 346952038
172.31.0.46 3.230.172.154 egress 343799447

Analyze data transfers by EC2 instance ID

You can use custom VPC Flow Logs to analyze an Amazon Elastic Compute Cloud (Amazon EC2) instance ID, directly. Taking the previous query, you can now determine the most active EC2 instances by using the instance_id field.

Query

parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| stats sum(bytes) as Data_Transferred by instance_id
| sort by Data_Transferred desc
| limit 5

Results

instance_id Data_Transferred
- 1443477306
i-03205758c9203c979 517558754
i-0ae33894105aa500c 324629414
i-01506ab9e9e90749d 198063232
i-0724007fef3cb06f3 54847643

Filter for rejected SSH traffic

To better understand the traffic that was denied by your security group and network access control lists (ACL), filter on reject VPC Flow Logs. You can further narrow this filter down to include protocol and target port. To identify hosts that are rejected on SSH traffic, extend the filter to include TCP protocol (for example, protocol 6) and traffic with a destination port of 22.

Query

parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| filter action = "REJECT" and protocol = 6 and dstport = 22
| stats sum(bytes) as SSH_Traffic_Volume by srcaddr
| sort by SSH_Traffic_Volume desc
| limit 2

Results

srcaddr SSH_Traffic_Volume
23.95.222.129 160
179.43.167.74 80

Isolate HTTP data stream for a specific source/destination pair

To further investigate trends in your data using CloudWatch Logs Insights, isolate bidirectional traffic between two IP addresses. In this query, ["172.31.1.247","172.31.11.212"] returns flow logs using either IP address as the source or destination IP address. To isolate HTTP traffic, the filter statements match VPC Flow Log events with protocol 6 (TCP) and port 80. Use the display keyword to return a subset of all available fields.

Query

#HTTP Data Stream for Specific Source/Destination Pair
parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| filter srcaddr in ["172.31.1.247","172.31.11.212"] and dstaddr in ["172.31.1.247","172.31.11.212"] and protocol = 6 and (dstport = 80 or srcport=80)
| display interface_id,srcaddr, srcport, dstaddr, dstport, protocol, bytes, action, log_status, start, end, flow_direction, tcp_flags
| sort by start desc
| limit 2

Results

interface_id srcaddr srcport dstaddr dstport protocol bytes action log_status
eni-0b74120275654905e 172.31.11.212 80 172.31.1.247 29376 6 5160876 ACCEPT OK
eni-0b74120275654905e 172.31.1.247 29376 172.31.11.212 80 6 97380 ACCEPT OK

Isolate HTTP data stream for specific source/destination pair

You can use CloudWatch Logs Insights to visualize results as a bar or pie chart. If the results include the bin() function, then query results are returned with a timestamp. This timeseries can then be visualized with a line or stacked area graph.

Building on the previous query, you can use stats sum(bytes) as Data_Trasferred by bin(1m) to calculate the cumulative data transferred over one-minute intervals. To view this visualization, toggle between the Logs and Visualization tables in the CloudWatch Logs Insights console.

Query

parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| filter srcaddr in ["172.31.1.247","172.31.11.212"] and dstaddr in ["172.31.1.247","172.31.11.212"] and protocol = 6 and (dstport = 80 or srcport=80)
| stats sum(bytes) as Data_Transferred by bin(1m)

Results

bin(1m) Data_Transferred
2022-04-01 15:23:00.000 17225787
2022-04-01 15:21:00.000 17724499
2022-04-01 15:20:00.000 1125500
2022-04-01 15:19:00.000 101525
2022-04-01 15:18:00.000 81376