Networking & Content Delivery

Analyze Network Traffic of Amazon Virtual Private Cloud (VPC) by CIDR blocks

An update was made on October 15, 2024: With the release of Athena engine version 3, native support for IP address functions is available through the Trino project. This eliminates the need for the Lambda function approach outlined in this blog post. To take advantage of this new enhancement, it is necessary to update the outlined Athena queries.


AWS enterprise customers are using hundreds of accounts and Amazon Virtual Private Cloud (Amazon VPC) to segment their workloads and expand their footprint. This level of scale can lead to challenges regarding resource sharing, inter-VPC connectivity, on-premises to VPC connectivity, and overall network configurations.

Certain scenarios may cause customers to misconfigure network configurations and routing. This can lead to network traffic being inadvertently routed from one private VPC through On-Premise Direct to another private VPC. These VPCs shouldn’t connect with each other, or they should only connect through approved VPC peering or AWS Transit Gateway. This unintentional routing not only creates additional security threats, but also increases the network latency by introducing additional network hops. Furthermore, maintaining these connections becomes an enterprise-wide concern, as any changes to the network configuration could impact communication between applications located in two separate VPCs.

In another scenario, as part of the organization split, customers may need to disconnect the existing connected peered private VPCs. If there are services in these VPCs that are communicating with each other, then disconnecting the VPCs can cause service disruptions. This means you must analyze interdependencies between the VPCs before making any network changes.

Moreover, these interdependencies between VPCs can be identified by analyzing the traffic coming and going from the VPC from another VPC’s Classless Inter-Domain Routing (CIDR) blocks.

To understand the problem better, consider the example scenario in the following figure. You have three VPCs, VPC-A (CIDR: 10.0.0.0/16), VPC-B (CIDR 10.1.0.0/16), and VPC-C (CIDR 10.2.0.0/16). All of these VPCs are private VPCs and are communicating with each other over the AWS private network. VPC-A and VPC-B are peered VPCs. VPC-A and VPC-C are connected to an on-premises data center through AWS Direct Connect. VPC-A and VPC-C aren’t connected with each other and aren’t supposed to connect.

You should analyze the network traffic of VPC-A for the following use case

Scenario 1: You want to analyze unintentional routing in and out of VPC-A. For that, you must identify all of the external resources that VPC-A is communicating with by excluding the VPC-A’s CIDR block.

Scenario 2: You want to disconnect the peering of VPC-A and VPC-B. For that, you must identify that VPC-A is communicating with specific resources (IP addresses) in VPC-B.

Scenario 3: VPC-C and VPC-A aren’t supposed to connect with each other. To detect a violation of this policy, you would need to identify any resource(s) in VPC-C that are inadvertently communicating with resources in VPC-A.

Scenario 4: In some scenarios, you want to analyze the VPC’s internal-only traffic to identify how the internal resources depend on each other.

Scenario 5: In some scenarios, you want to analyze only the incoming traffic from VPC-B to VPC-A.

Figure 1: Example VPC architecture

In this post, we’ll show you how to analyze VPC traffic via a particular CIDR block, and address all of the scenarios mentioned above using VPC Flow Logs, Amazon Athena, and AWS Lambda.

Solution overview

As part of our solution, you’ll first enable VPC Flow Logs on your targeted VPC. VPC Flow Logs is a feature that lets you capture information about the IP traffic going to and from your VPC network interfaces. VPC Flow Logs data can be published to Amazon CloudWatch Logs or Amazon Simple Storage Service (Amazon S3). In this solution, we will choose Amazon S3 as the destination.

Next, you will enable Athena. Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 by using standard SQL. Create an Athena table “vpcflowlogs” with the location set to the VPC Flow Logs destination in the Amazon S3 bucket.

VPC Flow Logs helps you identify the network traffic information mapped to specific IP addresses in your VPC. However, to filter traffic by a particular CIDR block or multiple CIDR blocks, you must create an additional Athena table that maps the CIDR block to a corresponding range of IP addresses. This mapping between the CIDR block and the corresponding range of IP addresses is generated using a Lambda function.

Next, you will create the Lambda function. Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. This particular Lambda function will take its input as CIDR block and generate a corresponding range of IP addresses list as a .csv file on a separately created Amazon S3 bucket. Then, you’ll create an Athena table named “cidr_table” with its location set to the separately created Amazon S3 bucket.

Finally, we’ll show how to analyze the traffic coming to and going from the targeted VPC from the particular CIDR block by using SQL Join query on cidr_table and vpcflowlogs tables, as shown in the following figure.

Figure 2: Solution Architecture: Filtering CIDR Block from VPC Flow Logs using the Lambda function and CIDR_IP mapping.csv.

Solution walkthrough

Step 1: Enable VPC Flow Logs to publish to the Amazon S3 bucket.

Create the VPC Flow Logs subscription with Amazon S3 as a destination as illustrated in the VPC Flow Logs user guide.Note that after you create a VPC Flow Logs, it can take several minutes to begin collecting and publishing data to the Amazon S3 bucket. VPC Flow Logs don’t capture real-time log streams for your network interfaces.

Step 2: Create the Athena database, Athena tables, and Lambda function

    1. Start by navigating to AWS CloudFormation in the AWS Console in the account and Region where your VPC Flow Logs is created
    2. Download the Cloudformation template cfn_vpcflowlogs_cidr.json from GitHub Repository
    3. Click on Create Stack
    4. Choose the downloaded cfn_vpcflowlogs_cidr.json file as shown in diagram below.

Figure 3: Choose the downloaded cfn_vpcflowlogs_cidr.json file

    1. Enter Stack name.
    2. Specify the name of the Amazon S3 bucket where VPC Flow Logs destination is set.
    3. Projection start date: This will be the start date from which you want to query the VPC Flow Logs records.

Figure 4: Specify stack name, VPC Flow Logs bucket and Projection start date

  1. Select next twice, and on the final screen check the box that allows CloudFormation to create the AWS Identity and Access Management (IAM) resources before selecting Create Stack.
  2. Go to the Resources section of the CloudFormation stack, and notice that the CloudFormation stack creates the following resources, as show in the following figure.
    • Athena database “vpcflowlogscidrdb”.
    • Athena table “vpcflowlogs”, with location set to VPC Flow Logs destination in the Amazon S3 bucket.
    • Lambda function “cidr_ip_generator”.
    • Amazon S3 bucket. The Lambda function will generate the CIDR wise range of IP addresses list and upload it to this Amazon S3 Bucket.
    • Athena table “cidrtable” with the destination set to the Amazon S3 bucket where the CIDR wise range of addresses list is stored.
    • IAM roles and Amazon S3 bucket polices.

Figure 5: Resources created by CloudFormation template

Step 3: Generate the range of IP addresses corresponding to your VPC CIDR blocks

Now, with the necessary infrastructure created, you can create the CIDR Block wise range of IP addresses for your VPC CIDR blocks using the Lambda function.

    1. Go to the Resources section of the CloudFormation stack.
    2. Select the Lambda Function cidr_ip_generator, as highlighted in the following figure.

Figure 6: Lambda function cidr_ip_generator

    1. The Lambda function cidr_ip_generator will open in a separate browser window. Choose the Test tab, and select Configure event as shown in the following figure.

Figure 7: Configure Lambda Test Event

    1. Configure the test event and pass the “cidr” parameter as your VPC CIDR Block, as shown in the following figure. Then, run the Lambda Function for test event. After executing the Lambda function <CIDRBLOCK> .csv file will be generated for input cidr block on the Amazon S3 bucket. If you expand the Execution result of the Lambda function, then you can notice the name of Amazon S3 bucket where Lambda generates the <CIDRBLOCK> .csv file as highlighted in Figure 9. The <CIDRBLOCK> .csv file has two columns, CIDR and IP, as shown in Figure 10.

Figure 8: Pass “cidr” block as JSON event

Figure 9: Lambda execution result displays the CIDR block process on the Amazon S3 bucket Name

Figure 10: Layout of <CIDRBLOCK>.csv file

  1. Optionally, you can run the Lambda function multiple times for different VPC CIDR blocks for which you want to analyze the traffic. Furthermore, you can run the Lambda programmatically and pass the JSON event with the parameter “cidr” instead of running it as the test event.
  2. Next, we’ll analyze the VPC traffic by the CIDR block for different scenarios.

Step 4: Analyze VPC network traffic via CIDR block by using Athena SQL

  1. Open the Athena console, and select the database “vpcflowlogscidrdb”.
  2. If you’re using Athena for the first time, then you must setup a query result location. Once the query result location is setup, you can execute the Athena queries for each scenario.

Figure 11: Select Athena Database vpcflowlogscidrdb

Scenario 1: The following query provides information about all of the external IP address that VPC-A is communicating with by excluding the VPC’s internal network traffic over the period of the last 10 days.

Note that if the VPC has a public subnet using resources with public IP addresses that are outside of VPC-A’s CIDR block (10.0.0.0/16), then the following query will treat those IPs as external IP addresses.

SELECT * FROM vpcflowlogscidrdb.vpcflowlogstable
 where NOT (dstaddr in (select ip from vpcflowlogscidrdb.cidrtable where CIDR='10.0.0.0/16')
and srcaddr in (select ip from vpcflowlogscidrdb.cidrtable where CIDR='10.0.0.0/16'))
and TO_DATE(day,'yyyy/mm/dd') > CURRENT_DATE- interval '10' day

Result

The following result shows all of the external IP addresses that are either receiving or sending network traffic to VPC-A.

Figure 12: Result shows all of the external IP addresses that are either receiving or sending network traffic to VPC-A

Scenario 2 : You want disconnect the peering of VPC-A and VPC-B. For that, you must identify that VPC-A is communicating with specific resources (IP addresses) in VPC-B over a period of the last 10 days.

SELECT * FROM vpcflowlogscidrdb.vpcflowlogstable
where (dstaddr in (select ip from vpcflowlogscidrdb.cidrtable where CIDR='10.1.0.0/16')
or srcaddr in (select ip from vpcflowlogscidrdb.cidrtable where CIDR='10.1.0.0/16'))
and TO_DATE(day,'yyyy/mm/dd') > CURRENT_DATE- interval '10' day

Result:

The following result shows all of the Source and Destination IP addresses that VPC-A and VPC-B are communicating with each other. The example results show that VPC-A has dependency with resource with the IP address 10.1.0.123 in VPC-B. To disconnect the VPC-A and VPC-B, this dependency of resource with IP address 10.1.0.123 must be addressed.

Figure 13: Result shows all of the Source and Destination IP addresses that VPC-A and VPC-B are communicating with each other

Scenario 3: You want to identify any resources in VPC-C that are inadvertently communicating with VPC-A. The following query provides information about all of the traffic coming to or going out of VPC-C from VPC -A over the period of the last 10 days

SELECT * FROM vpcflowlogscidrdb.vpcflowlogstable
where (dstaddr in (select ip from vpcflowlogscidrdb.cidrtable where CIDR='10.2.0.0/16') 
or srcaddr in (select ip from vpcflowlogscidrdb.cidrtable where CIDR='10.2.0.0/16'))
and TO_DATE(day,'yyyy/mm/dd') > CURRENT_DATE- interval '10' day

Result:

The results show that the source with the IP address 10.2.0.146 in VPC-C is inadvertently communicating with the VPC-A resource with the IP address 10.0.0.151.

Figure 14: Result shows resources in VPC-C that are inadvertently communicating with VPC-A

Scenario 4: You want to identify the VPC internal-only traffic. The following query provides information on VPC-A’s internal-only traffic over the period of the last 10 days.

Note that if the VPC is using resources with public IP addresses that are outside of VPC-A’s CIDR block (10.0.0.0/16), then the following query won’t be able to capture traffic going in and out of these public IPs.

SELECT * FROM vpcflowlogscidrdb.vpcflowlogstable
where (dstaddr in (select ip from vpcflowlogscidrdb.cidrtable where CIDR='10.0.0.0/16')
and srcaddr in (select ip from vpcflowlogscidrdb.cidrtable where CIDR='10.0.0.0/16'))
and TO_DATE(day,'yyyy/mm/dd') > CURRENT_DATE- interval '10' day

Result:

The following result shows that the resource with the IP address 10.0.0.151 is communicating with the Amazon Elastic Compute Cloud (Amazon EC2) instance 10.0.0.175.

Figure 15: Result shows VPC-A’s internal-only traffic

Scenario 5: The following query provides information about all of the incoming traffic from VPC-B to VPC -A over the period of the last 10 days.

SELECT * FROM vpcflowlogscidrdb.vpcflowlogstable JOIN  vpcflowlogscidrdb.cidrtable ON  srcaddr=ip
AND  CIDR='10.1.0.0/16' and TO_DATE(day,'yyyy/mm/dd') > CURRENT_DATE- interval '10' day

Result:

The following result shows all of the incoming traffic from VPC-B to VPC-A.

Figure 16: Result shows all of the incoming traffic from VPC-B to VPC -A

These are some sample queries provided, and you can further modify the queries to filter out, exclude, or include traffic generated from the target CIDR Blocks. Furthermore, you can adjust the number of day intervals based on your specific requirements.

Clean up

Step 1: Delete the VPC Flow Logs

If you have created VPC Flow Logs specifically for this solution, then you can delete the VPC Flow Logs by following the instructions provided here.

Step 2: Delete the CloudFormation Stack

Delete the CloudFormation stack that you deployed.

Conclusion

In this post, we showed how you can analyze traffic going in and out of your VPC from a particular CIDR block by using VPC Flow Logs, Athena, and Lambda. We also showed how you can identify the specific IP dependencies between peered private VPCs. Furthermore, we showed how you can analyze the unintentional routing in and out of the VPC over private IP addresses.

To learn more about VPC Flow Logs see this link.

To learn more about Querying VPC Flow Logs using Athena, see this link.

About the authors

Abhijit

Abhijit Rajeshirke

Abhijit Rajeshirke is a Solutions Architect for the Enterprise customers at AWS. His core focus is of work is Data Analytics, Big Data, Serverless technologies. Outside of work, he enjoys taking long mindful walk on any available track or trails.

Charu

Charu Singh

Charu Singh is a Software Development Engineer for VPC Flow Logs at AWS.

Hooman Rashedi (Guest)

Hooman is a Cloud Principal Architect specializing in AWS Cloud architecture and application cloud modernization. He worked for Nielsen for 18 years, and is currently working with AWS Professional Services Premiere partner ‘Presidio’. Hooman has worked on multiple AWS projects for the past 6 years and enjoys helping customers with Cloud migration and modernization of their technology stack.