Networking & Content Delivery

Understand your network traffic trends using AWS Transit Gateway Flow Logs

AWS Transit Gateway is a network transit hub that enables you to connect thousands of Amazon Virtual Private Clouds (Amazon VPCs) and your on-premises networks using a single gateway. This simplifies your network connection and puts an end to complex peering relationships. AWS Transit Gateway Flow Logs enables you to export detailed telemetry information, such as source/destination IP addresses, ports, protocol, traffic counters, timestamps, and various metadata for all the network traffic flows traversing your Transit Gateways. It can help you with many use-cases, such as network troubleshooting, network capacity planning, and compliance, security. This post provides information on how Transit Gateway Flow Logs work, their record format, and how to create a new Transit Gateway Flow Logs subscription. You can use multiple AWS destinations, such as Amazon Simple Storage Service (Amazon S3), Amazon CloudWatch Logs, and Amazon Kinesis Data Firehose to view your flow logs and integrate with other AWS services, such as Amazon Athena, Amazon QuickSightCloudWatch Contributor Insights, or partner solutions to analyze your Transit Gateway Flow Logs.

In this post, we walk you through how to use Athena to query Transit Gateway Flow Logs data stored in Amazon S3. We use predefined queries created by AWS CloudFormation template for answering commonly asked questions.

Solution overview

The following diagram (AWS Transit Gateway – Amazon Athena Solution overview) shows a high-level overview of a CloudFormation template that is created for this integration. You can also download and customize the CloudFormation templates to change the infrastructure setup or queries as per your requirements.

Transit Gateway with Athena integration architecture:

TGW - Athena Solution overview

AWS Transit Gateway – Amazon Athena Solution overview

The CloudFormation template creates the following resources:

  • A partitioned table in AWS Glue corresponding to the Transit Gateway Flow Logs records.
  • A database in Glue to store the Glue tables.
  • An AWS Lambda function that loads new partitions to the table daily.
  • An AWS Identity and Access Management (IAM) role that grants permission to run the Lambda functions.
  • A workgroup in Athena to store the named queries, along with a set of named queries in the workgroup.

Create Transit Gateway Flow Logs subscription

You must create the Transit Gateway Flow Logs subscription with Amazon S3 bucket as a destination with Custom Fields, Parquet file format, Hive-compatible S3 prefixes enabled, and 24 hours default partition. For more information please see Transit Gateway documentation. If you are creating Flow logs from the AWS Management Console, then choose “Select All” to select all the fields. The CloudFormation template provided in this post requires all available fields in Transit Gateway Flow Logs to create the Athena database. Once you have created a Transit Gateway Flow Logs subscription, the CloudFormation template creates the resources necessary to analyze the Transit Gateway Flow Logs in Amazon S3 using Athena.

If you want to create Transit Gateway Flow Logs using AWS Command Line Interface (AWS CLI), navigate to AWS CloudShell service from the Console and use the following command to create the Transit Gateway Flow Logs subscription.

aws ec2 create-flow-logs \
--resource-type TransitGateway \
--resource-ids {TGW_Id} \
--log-group-name my-tgw-flow-logs
--log-destination-type s3 \
--log-destination {S3_Arn}/TGW/ \
--destination-options FileFormat=parquet,HiveCompatiblePartitions=True,PerHourPartition=false \
--log-format '${version} ${resource-type} ${account-id} ${tgw-id} ${tgw-attachment-id} ${tgw-src-vpc-account-id} ${tgw-dst-vpc-account-id} ${tgw-src-vpc-id} ${tgw-dst-vpc-id} ${tgw-src-subnet-id} ${tgw-dst-subnet-id} ${tgw-src-eni} ${tgw-dst-eni} ${tgw-src-az-id} ${tgw-dst-az-id} ${tgw-pair-attachment-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${log-status} ${type} ${packets-lost-no-route} ${packets-lost-blackhole} ${packets-lost-mtu-exceeded} ${packets-lost-ttl-expired} ${tcp-flags} ${region} ${flow-direction} ${pkt-src-aws-service} ${pkt-dst-aws-service}'
  • Replace {S3_Arn} with the actual Amazon Resource Name (ARN) of the S3 bucket where you want to store the Transit Gateway Flow Logs. For example, you can use “TGW” as the prefix for the Transit Gateway Flow Logs. Optionally, for additional security for your S3 bucket, we recommend that you turn on the bucket owner condition for AWS CLI or Amazon S3 REST API access types.
  • Replace {TGW_Id} with the Transit Gateway ID of your Transit Gateway for which you are creating flow logs.

Using Transit Gateway Flow Logs Athena integration

The following steps provide detailed information to run the CloudFormation template to deploy the resources for querying Transit Gateway Flow Logs using Athena.

Step 1 – Create CloudFormation Stack

  • After you have created your Transit Gateway Flow Logs subscription with Amazon S3 as the destination, you can download the CloudFormation Template from our Github repository.
  • Login to AWS Console and navigate to CloudFormation service.
  • Click on Create stack dropdown and select “With new resources (standard)“.
  • Select Upload a template file and select Choose file to upload the previously downloaded CloudFormation Template as shown in diagram (CloudFormation Create Stack).
  • On the CloudFormation “Create stack” page, select Next.
    CloudFormation Create Stack

    CloudFormation Create Stack

    On the “Specify stack details” page, diagram (CloudFormation Stack details)

  • Provide the stack name “tgw-flowlogs-stack
  • Provide all the required parameters except the optional parameters TGWFlowLogsAthenaDatabaseName and TGWFlowLogsAthenaTableName
  • Select Next, and then Next on the “Configure stack options” page

    CloudFormation Stack details

    CloudFormation Stack details

  • Review the Stack configuration, select “I acknowledge that AWS CloudFormation might create IAM resources”, and select Submit.
  • Stack takes a few minutes to complete. Once completed, you should see the CREATE_COMPLETE status.

Step 2 – Analyze network traffic using predefined queries

Once your CloudFormation stack has been created, you can use Athena to analyze your Transit Gateway Flow Logs data. To do so, navigate to the Athena console. First, make sure that you select the correct Data Source and Database to query, as shown in the diagram (Amazon Athena Query Editor).

Amazon Athena Query Editor

Amazon Athena Query Editor

Step 3 – Running a predefined query

The CloudFormation template provides a set of predefined queries that you can run to quickly get some insights about the network traffic in your AWS network. To access these pre-defined queries, switch to the workgroup created by the CloudFormation template as shown in the diagram (Amazon Athena Workgroup). Upon selecting “TGWFlowLogsQueryWorkGroup” workgroup, you are presented with a workgroup settings modal to acknowledge the query result settings. Select Acknowledge.

Amazon Athena Workgroup

Amazon Athena Workgroup

Navigate to the Saved queries panel to see the list of pre-defined queries diagram (Amazon Athena Saved Queries). In this example, we have selected the “TGWFlowLogsTopTalkers” query to see the top 50 source IP addresses by traffic volume.

Amazon Athena Saved Queries

Amazon Athena Saved Queries

Select the query to open the Query editor where you can see the query and modify it as needed. Select “Run” to see the results of your query in Athena as shown in diagram (Amazon Athena Saved Queries). The results of the query are also saved in the S3 bucket you specified earlier. As shown in the following image, we can see the top 50 source IP addresses by traffic volume in the result section.

Amazon Athena Saved Queries

Amazon Athena Saved Queries

Note that the queries created for you by CloudFormation template depend on the Transit Gateway Flow Logs fields that are enabled in your flow log subscription. To get the most flexibility, make sure you have all relevant fields enabled when you create your flow logs subscription, and modify the CloudFormation template to reflect the same fields.

Cleaning up

To avoid ongoing charges for the resources you created, delete the following resources:

  • Navigate to the CloudFormation console and delete the stack you deployed by CloudFormation template (referenced in Step 1).
  • Delete the Transit Gateway Flow Logs subscription.
  • Delete the S3 bucket for the Transit Gateway Flow Logs.
  • If you created a new VPC and new resources in the VPC, then delete the resources and VPC.

Conclusion

You can now easily get started with using Athena to analyze Transit Gateway Flow Logs stored in Amazon S3. You no longer have to worry about manually creating an Athena table, or partitioning and loading data into the table. The CloudFormation template provided in this post automates these initial steps required for you to use Athena to analyze Transit Gateway Flow Logs. You can deploy this CloudFormation template to perform this automatically, and also get a set of named queries in Athena to help you easily analyze Transit Gateway Flow Logs data, as well as insights about your AWS environment based on network traffic data.

Amazon Athena Saved QueriesChaitanya Shah

Chaitanya Shah is a Principal Technical Account Manager with AWS, based out of New York. He loves to code and actively contributes to the AWS solutions labs to help customers solve complex problems. He provides guidance to AWS customers on best practices for their Cloud migrations. He is also specialized in AWS data transfer and the data and analytics domain.

Nishant Kumar

Nishant Kumar is a Senior Product Manager in the Amazon VPC team. He is interested in areas of network observability and network management. Outside work, he loves Formula 1 racing, cooking, and exploring wildlife.