AWS News Blog

Learn From Your VPC Flow Logs With Additional Meta-Data

Voiced by Polly

Flow Logs for Amazon Virtual Private Cloud (Amazon VPC) enables you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow Logs data can be published to Amazon CloudWatch Logs or Amazon Simple Storage Service (Amazon S3).

Since we launched VPC Flow Logs in 2015, you have been using it for variety of use-cases like troubleshooting connectivity issues across your VPCs, intrusion detection, anomaly detection, or archival for compliance purposes. Until today, VPC Flow Logs provided information that included source IP, source port, destination IP, destination port, action (accept, reject) and status. Once enabled, a VPC Flow Log entry looks like the one below.

While this information was sufficient to understand most flows, it required additional computation and lookup to match IP addresses to instance IDs or to guess the directionality of the flow to come to meaningful conclusions.

Today we are announcing the availability of additional meta data to include in your Flow Logs records to better understand network flows. The enriched Flow Logs will allow you to simplify your scripts or remove the need for postprocessing altogether, by reducing the number of computations or lookups required to extract meaningful information from the log data.

When you create a new VPC Flow Log, in addition to existing fields, you can now choose to add the following meta-data:

  • vpc-id : the ID of the VPC containing the source Elastic Network Interface (ENI).
  • subnet-id : the ID of the subnet containing the source ENI.
  • instance-id : the Amazon Elastic Compute Cloud (Amazon EC2) instance ID of the instance associated with the source interface. When the ENI is placed by AWS services (for example, AWS PrivateLink, NAT Gateway, Network Load Balancer etc) this field will be “-
  • tcp-flags : the bitmask for TCP Flags observed within the aggregation period. For example, FIN is 0x01 (1), SYN is 0x02 (2), ACK is 0x10 (16), SYN + ACK is 0x12 (18), etc. (the bits are specified in “Control Bits” section of RFC793 “Transmission Control Protocol Specification”).
    This allows to understand who initiated or terminated the connection. TCP uses a three way handshake to establish a connection. The connecting machine sends a SYN packet to the destination, the destination replies with a SYN + ACK and, finally, the connecting machine sends an ACK. In the Flow Logs, the handshake is shown as two lines, with tcp-flags values of 2 (SYN), 18 (SYN + ACK).  ACK is reported only when it is accompanied with SYN (otherwise it would be too much noise for you to filter out).
  • type : the type of traffic : IPV4, IPV6 or Elastic Fabric Adapter.
  • pkt-srcaddr : the packet-level IP address of the source. You typically use this field in conjunction with srcaddr to distinguish between the IP address of an intermediate layer through which traffic flows, such as a NAT gateway.
  • pkt-dstaddr : the packet-level destination IP address, similar to the previous one, but for destination IP addresses.

To create a VPC Flow Log, you can use the AWS Management Console, the AWS Command Line Interface (AWS CLI) or the CreateFlowLogs API, select S3 as destination, and select which additional information and the order you want to consume the fields, for example:

Or using the AWS Command Line Interface (AWS CLI) as below:

$ aws ec2 create-flow-logs --resource-type VPC \
                            --region eu-west-1 \
                            --resource-ids vpc-12345678 \
                            --traffic-type ALL  \
                            --log-destination-type s3 \
                            --log-destination arn:aws:s3:::sst-vpc-demo \
                            --log-format '${version} ${vpc-id} ${subnet-id} ${instance-id} ${interface-id} ${account-id} ${type} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${pkt-srcaddr} ${pkt-dstaddr} ${protocol} ${bytes} ${packets} ${start} ${end} ${action} ${tcp-flags} ${log-status}'

# be sure to replace the bucket name and VPC ID !

{
    "ClientToken": "1A....HoP=",
    "FlowLogIds": [
        "fl-12345678123456789"
    ],
    "Unsuccessful": [] 
}

Enriched VPC Flow Logs are delivered to S3. We will automatically add the required S3 Bucket Policy to authorize VPC Flow Logs to write to your S3 bucket. VPC Flow Logs does not capture real-time log streams for your network interface, it might take several minutes to begin collecting and publishing data to the chosen destinations. Your logs will eventually be available on S3 at s3://<bucket name>/AWSLogs/<account id>/vpcflowlogs/<region>/<year>/<month>/<day>/

An SSH connection from my laptop with IP address 90.90.0.200 to an EC2 instance would appear like this :

3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 172.31.22.145 90.90.0.200 22 62897 172.31.22.145 90.90.0.200 6 5225 24 1566328660 1566328672 ACCEPT 18 OK
3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 90.90.0.200 172.31.22.145 62897 22 90.90.0.200 172.31.22.145 6 4877 29 1566328660 1566328672 ACCEPT 2 OK

172.31.22.145 is the private IP address of the EC2 instance, the one you see when you type ifconfig on the instance.  All flags are “OR”ed during aggregation period. When connection is short, probably both SYN and FIN (3), as well as SYN+ACK and FIN (19) will be set for the same lines.

Once a Flow Log is created, you can not add additional fields or modify the structure of the log to ensure you will not accidently break scripts consuming this data. Any modification will require you to delete and recreate the VPC Flow Logs. There is no additional cost to capture the extra information in the VPC Flow Logs, normal VPC Flow Log pricing applies, remember that Enriched VPC Flow Log records might consume more storage when selecting all fields.  We do recommend to select only the fields relevant to your use-cases.

Enriched VPC Flow Logs is available in all regions where VPC Flow logs is available, you can start to use it today.

-- seb

PS: I heard from the team they are working on adding additional meta-data to the logs, stay tuned for updates.

Sébastien Stormacq

Sébastien Stormacq

Seb has been writing code since he first touched a Commodore 64 in the mid-eighties. He inspires builders to unlock the value of the AWS cloud, using his secret blend of passion, enthusiasm, customer advocacy, curiosity and creativity. His interests are software architecture, developer tools and mobile computing. If you want to sell him something, be sure it has an API. Follow him on Twitter @sebsto.