AWS News Blog
Learn From Your VPC Flow Logs With Additional Meta-Data
|
Flow Logs for Amazon Virtual Private Cloud (Amazon VPC) enables you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow Logs data can be published to Amazon CloudWatch Logs or Amazon Simple Storage Service (Amazon S3).
Since we launched VPC Flow Logs in 2015, you have been using it for variety of use-cases like troubleshooting connectivity issues across your VPCs, intrusion detection, anomaly detection, or archival for compliance purposes. Until today, VPC Flow Logs provided information that included source IP, source port, destination IP, destination port, action (accept, reject) and status. Once enabled, a VPC Flow Log entry looks like the one below.
While this information was sufficient to understand most flows, it required additional computation and lookup to match IP addresses to instance IDs or to guess the directionality of the flow to come to meaningful conclusions.
Today we are announcing the availability of additional meta data to include in your Flow Logs records to better understand network flows. The enriched Flow Logs will allow you to simplify your scripts or remove the need for postprocessing altogether, by reducing the number of computations or lookups required to extract meaningful information from the log data.
When you create a new VPC Flow Log, in addition to existing fields, you can now choose to add the following meta-data:
vpc-id
: the ID of the VPC containing the source Elastic Network Interface (ENI).subnet-id
: the ID of the subnet containing the source ENI.instance-id
: the Amazon Elastic Compute Cloud (Amazon EC2) instance ID of the instance associated with the source interface. When the ENI is placed by AWS services (for example, AWS PrivateLink, NAT Gateway, Network Load Balancer etc) this field will be “-
“tcp-flags
: the bitmask for TCP Flags observed within the aggregation period. For example,FIN
is 0x01 (1),SYN
is 0x02 (2),ACK
is 0x10 (16),SYN
+ACK
is 0x12 (18), etc. (the bits are specified in “Control Bits” section of RFC793 “Transmission Control Protocol Specification”).
This allows to understand who initiated or terminated the connection. TCP uses a three way handshake to establish a connection. The connecting machine sends aSYN
packet to the destination, the destination replies with aSYN + ACK
and, finally, the connecting machine sends anACK
. In the Flow Logs, the handshake is shown as two lines, withtcp-flags
values of 2 (SYN
), 18 (SYN + ACK
).ACK
is reported only when it is accompanied with SYN (otherwise it would be too much noise for you to filter out).type
: the type of traffic : IPV4, IPV6 or Elastic Fabric Adapter.pkt-srcaddr
: the packet-level IP address of the source. You typically use this field in conjunction withsrcaddr
to distinguish between the IP address of an intermediate layer through which traffic flows, such as a NAT gateway.pkt-dstaddr
: the packet-level destination IP address, similar to the previous one, but for destination IP addresses.
To create a VPC Flow Log, you can use the AWS Management Console, the AWS Command Line Interface (AWS CLI) or the CreateFlowLogs API, select S3 as destination, and select which additional information and the order you want to consume the fields, for example:
Or using the AWS Command Line Interface (AWS CLI) as below:
$ aws ec2 create-flow-logs --resource-type VPC \
--region eu-west-1 \
--resource-ids vpc-12345678 \
--traffic-type ALL \
--log-destination-type s3 \
--log-destination arn:aws:s3:::sst-vpc-demo \
--log-format '${version} ${vpc-id} ${subnet-id} ${instance-id} ${interface-id} ${account-id} ${type} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${pkt-srcaddr} ${pkt-dstaddr} ${protocol} ${bytes} ${packets} ${start} ${end} ${action} ${tcp-flags} ${log-status}'
# be sure to replace the bucket name and VPC ID !
{
"ClientToken": "1A....HoP=",
"FlowLogIds": [
"fl-12345678123456789"
],
"Unsuccessful": []
}
Enriched VPC Flow Logs are delivered to S3. We will automatically add the required S3 Bucket Policy to authorize VPC Flow Logs to write to your S3 bucket. VPC Flow Logs does not capture real-time log streams for your network interface, it might take several minutes to begin collecting and publishing data to the chosen destinations. Your logs will eventually be available on S3 at s3://<bucket name>/AWSLogs/<account id>/vpcflowlogs/<region>/<year>/<month>/<day>/
An SSH connection from my laptop with IP address 90.90.0.200 to an EC2 instance would appear like this :
3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 172.31.22.145 90.90.0.200 22 62897 172.31.22.145 90.90.0.200 6 5225 24 1566328660 1566328672 ACCEPT 18 OK
3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 90.90.0.200 172.31.22.145 62897 22 90.90.0.200 172.31.22.145 6 4877 29 1566328660 1566328672 ACCEPT 2 OK
172.31.22.145
is the private IP address of the EC2 instance, the one you see when you type ifconfig
on the instance. All flags are “OR”ed during aggregation period. When connection is short, probably both SYN
and FIN
(3), as well as SYN
+ACK
and FIN
(19) will be set for the same lines.
Once a Flow Log is created, you can not add additional fields or modify the structure of the log to ensure you will not accidently break scripts consuming this data. Any modification will require you to delete and recreate the VPC Flow Logs. There is no additional cost to capture the extra information in the VPC Flow Logs, normal VPC Flow Log pricing applies, remember that Enriched VPC Flow Log records might consume more storage when selecting all fields. We do recommend to select only the fields relevant to your use-cases.
Enriched VPC Flow Logs is available in all regions where VPC Flow logs is available, you can start to use it today.
-- sebPS: I heard from the team they are working on adding additional meta-data to the logs, stay tuned for updates.