Networking & Content Delivery

Understand AWS Data transfer details in depth from cost and usage report using Athena query and QuickSight

Keeping applications up and running continuously requires architecting your application to prevent downtime, as well as the ability to recover from failure as quickly as possible with minimum data loss to achieve RTO-Recovery Time Objective and RPO– Recovery Point Objective. AWS helps you achieve high availability for cloud workloads across multiple dimensions, such as compute, databases, and storage service. Data transfer cost is an important part of making architectural decisions, as customers design their architecture in the cloud to meet their business requirements of highly available, fault tolerant, and resilient applications. This post will show you how to use data stored in the Cost and Usage Report (CUR) to quickly find the resources related to the bandwidth cost so that you can make the right architecture decisions and thus reduce the data transfer costs.

AWS records the data transfer bandwidth use in the CUR for your applications running in AWS. The data transfer usage and cost summary is also available in your monthly invoices. This provides you with an overall picture of your spending on data transfer as a service by region and bandwidth used. On the other hand, the CUR provides granular details across many dimensions. Therefore, it’s essential to understand the data transfer type, usage type, and operation from the CUR records. This will let you deepen your understanding of your bandwidth cost and see which resources generate this cost.

Although data transfer charges appear in the CUR under product family “Data Transfer”, services such as NAT Gateway, AWS Transit Gateway, and AWS PrivateLink(Interface Endpoint, Gateway Load Balancer), records data processing charges for the data transferred via these services under the respective service names.

I will show you how AWS categorizes various data transfer types, along with important fields related to data transfer from the CUR. You will learn how to fetch data from Amazon Athena using sql query and visualize them using Data transfer Cost and Usage Analysis Dashboard from Well Architected Labs (WA Labs) built in Amazon QuickSight to understand where these costs are coming from.

Which AWS tools can be used to visualize data transfer costs?

You can visualize the usage and costs for data transfer in various ways. AWS Cost Explorer is a great tool for visualizing daily, monthly, and forecasted spends by combining an array of available filters and grouping them by service, linked accounts, and others. Refer to this post for more details.

Oftentimes, you want to understand what specific resources, operations, or AWS services are contributing to data transfer costs. Or you want to drill down to see the usage at an hourly granularity, understand patterns, and then visualize them in meaningful charts from QuickSight, as well as run a query to fetch results for specific services/applications/tags. Athena makes all of this very easy. But before we get into the Query or QuickSight charts details, let’s understand benefits of using the CUR, what data transfer is, as well as the data transfer types and usage types associated with data transfer.

Benefits of using the CUR for cost and usage analysis

  • Stores data at hourly-granularity
  • Long-term retention (only costs are to store the CUR in Amazon Simple Storage Service (Amazon S3))
  • Single view for multiple accounts – when used in the Data Transfer dashboard in QuickSight
  • Resource IDs of resources (ARN-Amazon Resource Names, ID, or name, depending on the resource type)
  • Custom, shareable, and embed-able QuickSight dashboards built on top of the CUR data
  • Ability to join with other data sources from Athena (e.g., AWS account IDs, account names)
  • Ability to apply transformations to the CUR (e.g., create meaningful views, applying calculations on cost, bytes columns, and join with other Athena tables)

Understand data transfer types and usage types in the CUR

A pragmatic way to understand data transfer is through data transfer types and usage types, under which costs are recorded. Once you understand the usage_type and operation attributing to the data transfer cost, you can narrow down the specific architecture pattern that could be altered to avoid high data transfer charges, all while considering business requirements.

Note that the following table is the list of major types that contribute to data transfer usage. Usage Type (line_item_usage_type) and Data Transfer Type (product_transfer_type) are available in the CUR as a field (Refer here).

Category

Data Transfer Type

Usage Type

Description

Internet

AWS Inbound/Outbound

DataTransfer-[In|Out]-Bytes

Ingress/Egress data transfer for the internet

{SOURCE}-DataTransfer-[In|Out]-Bytes

Direct
Connect

InterRegion
Inbound/Outbound, IntraRegion Inbound/Outbound

{SOURCE}-[POP]-DataXfer-[In|Out]

This is DX (Direct Connect) traffic, i.e., a physical connection
to the customer’s on-premises data center over public VIF

{SOURCE}-[POP]-DataXfer-[In|Out]:dc.3

CloudFront

InterRegion
Inbound/Outbound

CloudFront-In-Bytes

This refers to data transfer to us-east-1 from CloudFront

CloudFront-Out-Bytes

This refers to data transfer to CloudFront from us-east-1

{REGION}-CloudFront-[In|Out]-Bytes

This refers to data transfer between AWS Region to CloudFront and
from CloudFront to a AWS Region

InterRegion
Inbound/Outbound China

{SOURCE}-CloudFrontChina-[In|Out]-Bytes

This is traffic between an AWS region and a CloudFront PoP in a China region

CloudFront Outbound

{Region}-DataTransfer-Out-Bytes

You incur CloudFront charges when CloudFront responds to
requests for your objects served from the edge location to end user. The
charges include data transfer for WebSocket data from the server to the client.

CloudFront to Origin

{Region}-DataTransfer-Out-OBytes

You incur CloudFront charges when users transfer data to your
origin, which includes DELETE, OPTIONS, PATCH, POST, and PUT requests. The
charges include data transfer for WebSocket data from the client to the server.

Region

IntraRegion

DataTransfer-Regional-Bytes

This is Amazon EC2 traffic that moves between AZs but stays
within the same region. If it doesn’t have the source, then it defaults to
us-east. It includes InterZone-In/Out, LoadBalancing, LoadBalancingPublicIP-In/Out, PublicIP-In/Out,
VpcEndpoint, and VPCPeering-In/Out

{SOURCE}-DataTransfer-Regional-Bytes

{SOURCE}-DataTransfer-xAZ-[In/Out]-Bytes

This refers to Cross-AZ data transfer to or
from VPC Endpoint, Transit
Gateway, Client VPN

IntraRegion-VPCpeering

{SOURCE}-DataTransfer-AZ-[In/Out]-Bytes

Data transfer in same AZ with a private IP

DataTransfer-Regional-Bytes

{SOURCE}-DataTransfer-Regional-Bytes

Data transfer across AZ with a private IP

InterRegion
Inbound/Outbound

{SOURCE}-{DESTINATION}-AWS-Out-Bytes

Data transfer out from the source AWS region to the destination AWS
region.

{DESTINATION}-{SOURCE}-AWS-In-Bytes

Data transfer from the source AWS region to the destination AWS region

Inter Region Peering Data Transfer Inbound/Outbound

{DESTINATION}-AWS-In-Bytes

Data transfer to destination AAWS region from another AWS region
for peered VPC

{SOURCE}-AWS-Out-Bytes

Data transfer out from source AWS region to another AWS region
for peered VPC

S3/DDB/SNS/SQS

Accelerated AWS Outbound from close by location, Accelerated InterRegion Outbound using edge locations outside {Regions}, InterRegion Inbound/Outbound

DataTransfer-Out-ABytes-[T1/T2]

This refers to accelerated Amazon S3 data transfer out from
close by location.

{SOURCE}-{DESTINATION}-S3RTC-[In|Out]-Bytes

This refers to cross AWS region replication data transfer.
Source AWS region Data Transfer for Replication Time Control to destination AWS
region.

S3

Accelerated AWS Outbound from close by location, Accelerated InterRegion Outbound using edge locations outside {Regions}

{DESTINATION}-{SOURCE}-AWS-In-ABytes-[T1,T2]

This refers to accelerated data transfer traffic from Amazon S3
to destination AWS Region

{SOURCE}-{DESTINATION}-AWS-Out-ABytes-[T1,T2]

This refers to accelerated data transfer traffic from Amazon S3
to destination AWS Region

AWS
Global Accelerator

{DESTINATION}-{SCOURCE}-IN-Bytes-Internet

Data transfer premium IN bytes from source internet clients to
destination AWS region

{SOURCE}-{DESTINATION}-OUT-Bytes-Internet

Data transfer premium OUT bytes from source AWS region to
destination internet clients

Load
Balancers

IntraRegion

DataTransfer-Regional-Bytes

{SOURCE}-DataTransfer-Regional-Bytes

Internal ALB or CLB to target instances

Data Processing-NAT Gateway

{SOURCE}-NatGateway-Bytes

OR

NatGateway-Bytes

Per bytes data processed by NAT Gateways from the source AWS region

Data Processing-Load Balancers

{SOURCE}-DataProcessing-Bytes

OR

DataProcessing-Bytes

Per bytes data processed by the LoadBalancer
from the source AWS region

Data Processing-Transit Gateway

{SOURCE}-TransitGateway-Bytes

Per bytes data processed by Transit Gateway VPC attachment

OR

Per bytes data processed by Transit Gateway DirectConnect
attachment

OR

Per bytes data processed by Transit Gateway VPN attachment

Data Processing-VPC Endpoint, PrivateLink

{SOURCE}-VpcEndpoint-Bytes

Per bytes data processed by VPC Endpoints from source AWS region

The following table shows sample rows from the table created in Athena for the CUR report stored in Amazon S3. For example, the first row shows data transferred from Asia Pacific (Singapore), i.e., ap-southeast-1 to Internet (external) endpoint for VPC peering operation, and second row shows data transferred within the ap-southeast-1 region via Amazon EC2.

Refer to the CUR data dictionary for more details on column names in the following header:

Note: line_item_usage_amount is in bytes

line_item_product_code

line_item_operation

product_region

line_item_usage_type

product_from_location

product_to_location

line_item_usage_amount

line_item_unblended_cost

product_transfer_type

AmazonEC2

VPCPeering-Out

ap-southeast-1

APS1-AWS-Out-Bytes

Asia Pacific (Singapore)

External

0.61

5.50E-05

Inter Region Peering Data Transfer Outbound

AmazonEC2

InterZone-In

ap-southeast-1

APS1-DataTransfer-Regional-Bytes

Asia Pacific (Singapore)

Asia Pacific (Singapore)

0.31

3.11E-05

IntraRegion

AmazonEC2

LoadBalancing

us-east-1

DataTransfer-Out-Bytes

US East (N. Virginia)

External

29.45

2.65E-04

AWS Outbound

AmazonEC2

RunInstances

us-east-1

DataTransfer-Out-Bytes

US East (N. Virginia)

External

117.18

1.05E-06

AWS Outbound

We recently announced the CUR Query library, a very helpful resource for fetching data for various AWS service usage using Athena queries. Here are some example Athena queries to fetch data transfer related to NAT from the view:

SELECT * FROM "(database)"."data_transfer_view" where resource_id like '%nat%';

Or from the CUR table:

SELECT * FROM "(database)"."(tablename)" where line_item_resource_id like '%nat%';

Note that you should replace (database) and (tablename) with your database and table name. You can create “data_transfer_view” with the instructions from WA Labs.

So far, we’ve discussed data transfer related fields from the CUR, and how to use Athena queries to analyze specific areas in your account that cause high data transfer traffic. Now let’s discuss how the data transfer dashboard from Well Architected Labs lets you visualize the usage and costs in graphical charts from QuickSight, all without learning how to query data when using SQL from Athena.

WA Labs QuickSight Data Transfer Dashboard

Although you can use the CUR and Athena to query data for data transfer usage, we have released an enterprise dashboard with pre-built analysis that were developed as a starting point based on the common needs from many customers. Further customize these analyses or add more analysis using QuickSight on top of the dataset created from WA Labs.

The data transfer dashboard from WA Labs provides three different views. They’re designed in a way for you to easily navigate from region to resource to an operation, be it Internet traffic or regional traffic that stays in AWS backbone. However, it’s charged depending on how you have architected your applications infrastructure.

  1. Data transfer Summary – A quick snapshot of the overall summary of data transfer across multiple accounts in your organization.
  2. Internet Data transfer Details – Detailed visualization to dive deep in the traffic pattern that occurs between AWS resources and the Internet.
  3. Regional Data transfer Details – Detailed visualization to dive deep in the traffic pattern that occurs between and within AWS Regions and Availability Zones.

The following sankey diagram shows a summary of the usage, and the cost for the traffic originated from source (“From”) to the target (“To”) location, i.e., AWS regions and Internet.

Figure 1. Cost analysis

Figure 2. Traffic Analysis

Both the Internet and Regional views provide analyses of cost and usage for resources and operations for all of the AWS accounts and regions that you use. Furthermore, you could drill down and drill up to a specific resource and its operation.

Figure 3. Resource Analysis

Figure 4. Operations Analysis

Both of the charts above are connected to each other. When you select the top resource, it will show you the corresponding operations that the resource is performing as well as its associated cost. And if you select the specific operation, then it will show you how many resources on the left side are performing that operation along with the cost. For example, in the above “Resource Analysis” diagram, one of the top resources is performing InterZone-in or InterZone-out operations. Use the data pulled from the CUR (either from Athena or QuickSight) and the definitions explained in this post to further dive into using VPC Flow Logs analysis from this post. This will let you narrow down which VPC, Subnet, AZ and specific IP address is coming on the path and causing data transfer charges.

As the diagram above showed, InterZone-In/Out are the top contributors to the data transfer cost. The following table will show you what InterZone-In/Out means. You can co-relate this flow with the pattern explained in this post to see what corrections can be made to your deployment architecture to avoid data transfer charges.

These are examples of the usage_type and operation combinations regardless of charge.

Usage Type

Operation Type

Variant

Direction

1

DataTransfer-Regional-Bytes

InterZone-In

Cross AZ (private IP)

In

2

DataTransfer-Regional-Bytes

InterZone-Out

Cross AZ (private IP)

Out

3

DataTransfer-Regional-Bytes

PublicIP-In

Same or Different AZ (public IP)

In

4

DataTransfer-Regional-Bytes

PublicIP-Out

Same or Different AZ (public IP)

Out

5

DataTransfer-Regional-Bytes

LoadBalancingPublicIP-In

Same or Different AZ (ELB public IP)

In

6

DataTransfer-Regional-Bytes

LoadBalancingPublicIP-Out

Same or Different AZ (ELB public IP)

Out

7

DataTransfer-Regional-Bytes

VPCPeering-In

VPC Peering

In

8

DataTransfer-Regional-Bytes

VPCPeering-Out

VPC Peering

Out

9

{SRC}-DataTransfer-Regional-Bytes

PublicIP-In

Same or Different AZ (public IP)

In

10

{SRC}-DataTransfer-Regional-Bytes

PublicIP-Out

Same or Different AZ (public IP)

Out

As AWS services usage increases, monitoring data transfer costs becomes an ongoing activity, and this dashboard becomes a useful tool for visualizing services, resources, and operations that are sources of data transfer usage. Furthermore, you can add more visualizations that would provide insights based on cost allocation tags in your AWS accounts. Tags are a great way to organize AWS resources in the AWS Management Console . The CUR supports the ability to break down AWS costs by tag. Typically, customers use business tags, such as cost center, business unit, or project, to associate AWS costs with traditional financial reporting dimensions within their organization. This lets customers easily associate costs with technical or security dimensions, such as specific applications, environments, or compliance. For more information, see the white-paper AWS Tagging Strategies: Implement an Effective AWS Resource Tagging Strategy. You must modify the Athena view query to add those fields, so that they will be available in the QuickSight dataset to be consumed by these analyses.

The following Data Transfer dashboard will provide a full in-depth view into your data transfer usage and cost.

Figure 5. Data transfer summary dashboard

Conclusion

We have shown how you can run simple Athena sql queries to fetch results for a specific scenario. Moreover, we showed how you can use the Data Transfer dashboard from WA Labs to dive into visualizing data transfer usage patterns and cost drivers, reviewing monthly trends, and identifying your Internet and Regional data transfer usage independently. Drill down to the resource level and its operations to understand the major contributors to data transfer costs. Refer to the VPC flow logs Lab to analyze subnet/IP-specific traffic and further understand traffic to certain IPs that can also be queried via Athena.

Chaitanya Shah

Chaitanya Shah is a Sr. Technical Account Manager with AWS, based out of New York. He has over 22 years of experience working with the enterprise customers. He loves to code and actively contributes to the AWS solutions, labs to help customers solve complex problems. He provides guidance to AWS customers on best practices for their AWS cloud migrations. He is also specialized in AWS data transfer and in data & analytics domain.