Networking & Content Delivery
Understand AWS Data transfer details in depth from cost and usage report using Athena query and QuickSight
Keeping applications up and running continuously requires architecting your application to prevent downtime, as well as the ability to recover from failure as quickly as possible with minimum data loss to achieve RTO-Recovery Time Objective and RPO– Recovery Point Objective. AWS helps you achieve high availability for cloud workloads across multiple dimensions, such as compute, databases, and storage service. Data transfer cost is an important part of making architectural decisions, as customers design their architecture in the cloud to meet their business requirements of highly available, fault tolerant, and resilient applications. This post will show you how to use data stored in the Cost and Usage Report (CUR) to quickly find the resources related to the bandwidth cost so that you can make the right architecture decisions and thus reduce the data transfer costs.
AWS records the data transfer bandwidth use in the CUR for your applications running in AWS. The data transfer usage and cost summary is also available in your monthly invoices. This provides you with an overall picture of your spending on data transfer as a service by region and bandwidth used. On the other hand, the CUR provides granular details across many dimensions. Therefore, it’s essential to understand the data transfer type, usage type, and operation from the CUR records. This will let you deepen your understanding of your bandwidth cost and see which resources generate this cost.
Although data transfer charges appear in the CUR under product family “Data Transfer”, services such as NAT Gateway, AWS Transit Gateway, and AWS PrivateLink(Interface Endpoint, Gateway Load Balancer), records data processing charges for the data transferred via these services under the respective service names.
I will show you how AWS categorizes various data transfer types, along with important fields related to data transfer from the CUR. You will learn how to fetch data from Amazon Athena using sql query and visualize them using Data transfer Cost and Usage Analysis Dashboard from Well Architected Labs (WA Labs) built in Amazon QuickSight to understand where these costs are coming from.
Which AWS tools can be used to visualize data transfer costs?
You can visualize the usage and costs for data transfer in various ways. AWS Cost Explorer is a great tool for visualizing daily, monthly, and forecasted spends by combining an array of available filters and grouping them by service, linked accounts, and others. Refer to this post for more details.
Oftentimes, you want to understand what specific resources, operations, or AWS services are contributing to data transfer costs. Or you want to drill down to see the usage at an hourly granularity, understand patterns, and then visualize them in meaningful charts from QuickSight, as well as run a query to fetch results for specific services/applications/tags. Athena makes all of this very easy. But before we get into the Query or QuickSight charts details, let’s understand benefits of using the CUR, what data transfer is, as well as the data transfer types and usage types associated with data transfer.
Benefits of using the CUR for cost and usage analysis
- Stores data at hourly-granularity
- Long-term retention (only costs are to store the CUR in Amazon Simple Storage Service (Amazon S3))
- Single view for multiple accounts – when used in the Data Transfer dashboard in QuickSight
- Resource IDs of resources (ARN-Amazon Resource Names, ID, or name, depending on the resource type)
- Custom, shareable, and embed-able QuickSight dashboards built on top of the CUR data
- Ability to join with other data sources from Athena (e.g., AWS account IDs, account names)
- Ability to apply transformations to the CUR (e.g., create meaningful views, applying calculations on cost, bytes columns, and join with other Athena tables)
Understand data transfer types and usage types in the CUR
A pragmatic way to understand data transfer is through data transfer types and usage types, under which costs are recorded. Once you understand the usage_type and operation attributing to the data transfer cost, you can narrow down the specific architecture pattern that could be altered to avoid high data transfer charges, all while considering business requirements.
Note that the following table is the list of major types that contribute to data transfer usage. Usage Type (line_item_usage_type) and Data Transfer Type (product_transfer_type) are available in the CUR as a field (Refer here).
Category |
Data Transfer Type |
Usage Type |
Description |
Internet |
AWS Inbound/Outbound |
DataTransfer-[In|Out]-Bytes |
Ingress/Egress data transfer for the internet |
{SOURCE}-DataTransfer-[In|Out]-Bytes |
|||
Direct |
InterRegion |
{SOURCE}-[POP]-DataXfer-[In|Out] |
This is DX (Direct Connect) traffic, i.e., a physical connection |
{SOURCE}-[POP]-DataXfer-[In|Out]:dc.3 |
|||
CloudFront |
InterRegion |
CloudFront-In-Bytes |
This refers to data transfer to us-east-1 from CloudFront |
CloudFront-Out-Bytes |
This refers to data transfer to CloudFront from us-east-1 |
||
{REGION}-CloudFront-[In|Out]-Bytes |
This refers to data transfer between AWS Region to CloudFront and |
||
InterRegion |
{SOURCE}-CloudFrontChina-[In|Out]-Bytes |
This is traffic between an AWS region and a CloudFront PoP in a China region |
|
CloudFront Outbound |
{Region}-DataTransfer-Out-Bytes |
You incur CloudFront charges when CloudFront responds to |
|
CloudFront to Origin |
{Region}-DataTransfer-Out-OBytes |
You incur CloudFront charges when users transfer data to your |
|
Region |
IntraRegion |
DataTransfer-Regional-Bytes |
This is Amazon EC2 traffic that moves between AZs but stays |
{SOURCE}-DataTransfer-Regional-Bytes |
|||
{SOURCE}-DataTransfer-xAZ-[In/Out]-Bytes |
This refers to Cross-AZ data transfer to or |
||
IntraRegion-VPCpeering |
{SOURCE}-DataTransfer-AZ-[In/Out]-Bytes |
Data transfer in same AZ with a private IP |
|
DataTransfer-Regional-Bytes {SOURCE}-DataTransfer-Regional-Bytes |
Data transfer across AZ with a private IP |
||
InterRegion |
{SOURCE}-{DESTINATION}-AWS-Out-Bytes |
Data transfer out from the source AWS region to the destination AWS |
|
{DESTINATION}-{SOURCE}-AWS-In-Bytes |
Data transfer from the source AWS region to the destination AWS region |
||
Inter Region Peering Data Transfer Inbound/Outbound |
{DESTINATION}-AWS-In-Bytes |
Data transfer to destination AAWS region from another AWS region |
|
{SOURCE}-AWS-Out-Bytes |
Data transfer out from source AWS region to another AWS region |
||
S3/DDB/SNS/SQS |
Accelerated AWS Outbound from close by location, Accelerated InterRegion Outbound using edge locations outside {Regions}, InterRegion Inbound/Outbound |
DataTransfer-Out-ABytes-[T1/T2] |
This refers to accelerated Amazon S3 data transfer out from |
{SOURCE}-{DESTINATION}-S3RTC-[In|Out]-Bytes |
This refers to cross AWS region replication data transfer. |
||
S3 |
Accelerated AWS Outbound from close by location, Accelerated InterRegion Outbound using edge locations outside {Regions} |
{DESTINATION}-{SOURCE}-AWS-In-ABytes-[T1,T2] |
This refers to accelerated data transfer traffic from Amazon S3 |
{SOURCE}-{DESTINATION}-AWS-Out-ABytes-[T1,T2] |
This refers to accelerated data transfer traffic from Amazon S3 |
||
AWS |
{DESTINATION}-{SCOURCE}-IN-Bytes-Internet |
Data transfer premium IN bytes from source internet clients to |
|
{SOURCE}-{DESTINATION}-OUT-Bytes-Internet |
Data transfer premium OUT bytes from source AWS region to |
||
Load |
IntraRegion |
DataTransfer-Regional-Bytes {SOURCE}-DataTransfer-Regional-Bytes |
Internal ALB or CLB to target instances |
Data Processing-NAT Gateway |
{SOURCE}-NatGateway-Bytes OR NatGateway-Bytes |
Per bytes data processed by NAT Gateways from the source AWS region |
|
Data Processing-Load Balancers |
{SOURCE}-DataProcessing-Bytes OR DataProcessing-Bytes |
Per bytes data processed by the LoadBalancer |
|
Data Processing-Transit Gateway |
{SOURCE}-TransitGateway-Bytes |
Per bytes data processed by Transit Gateway VPC attachment OR Per bytes data processed by Transit Gateway DirectConnect OR Per bytes data processed by Transit Gateway VPN attachment |
|
Data Processing-VPC Endpoint, PrivateLink |
{SOURCE}-VpcEndpoint-Bytes |
Per bytes data processed by VPC Endpoints from source AWS region |
The following table shows sample rows from the table created in Athena for the CUR report stored in Amazon S3. For example, the first row shows data transferred from Asia Pacific (Singapore), i.e., ap-southeast-1 to Internet (external) endpoint for VPC peering operation, and second row shows data transferred within the ap-southeast-1 region via Amazon EC2.
Refer to the CUR data dictionary for more details on column names in the following header:
Note: line_item_usage_amount is in bytes
line_item_product_code |
line_item_operation |
product_region |
line_item_usage_type |
product_from_location |
product_to_location |
line_item_usage_amount |
line_item_unblended_cost |
product_transfer_type |
AmazonEC2 |
VPCPeering-Out |
ap-southeast-1 |
APS1-AWS-Out-Bytes |
Asia Pacific (Singapore) |
External |
0.61 |
5.50E-05 |
Inter Region Peering Data Transfer Outbound |
AmazonEC2 |
InterZone-In |
ap-southeast-1 |
APS1-DataTransfer-Regional-Bytes |
Asia Pacific (Singapore) |
Asia Pacific (Singapore) |
0.31 |
3.11E-05 |
IntraRegion |
AmazonEC2 |
LoadBalancing |
us-east-1 |
DataTransfer-Out-Bytes |
US East (N. Virginia) |
External |
29.45 |
2.65E-04 |
AWS Outbound |
AmazonEC2 |
RunInstances |
us-east-1 |
DataTransfer-Out-Bytes |
US East (N. Virginia) |
External |
117.18 |
1.05E-06 |
AWS Outbound |
We recently announced the CUR Query library, a very helpful resource for fetching data for various AWS service usage using Athena queries. Here are some example Athena queries to fetch data transfer related to NAT from the view:
SELECT * FROM "(database)"."data_transfer_view" where resource_id like '%nat%';
Or from the CUR table:
SELECT * FROM "(database)"."(tablename)" where line_item_resource_id like '%nat%';
Note that you should replace (database) and (tablename) with your database and table name. You can create “data_transfer_view” with the instructions from WA Labs.
So far, we’ve discussed data transfer related fields from the CUR, and how to use Athena queries to analyze specific areas in your account that cause high data transfer traffic. Now let’s discuss how the data transfer dashboard from Well Architected Labs lets you visualize the usage and costs in graphical charts from QuickSight, all without learning how to query data when using SQL from Athena.
WA Labs QuickSight Data Transfer Dashboard
Although you can use the CUR and Athena to query data for data transfer usage, we have released an enterprise dashboard with pre-built analysis that were developed as a starting point based on the common needs from many customers. Further customize these analyses or add more analysis using QuickSight on top of the dataset created from WA Labs.
The data transfer dashboard from WA Labs provides three different views. They’re designed in a way for you to easily navigate from region to resource to an operation, be it Internet traffic or regional traffic that stays in AWS backbone. However, it’s charged depending on how you have architected your applications infrastructure.
- Data transfer Summary – A quick snapshot of the overall summary of data transfer across multiple accounts in your organization.
- Internet Data transfer Details – Detailed visualization to dive deep in the traffic pattern that occurs between AWS resources and the Internet.
- Regional Data transfer Details – Detailed visualization to dive deep in the traffic pattern that occurs between and within AWS Regions and Availability Zones.
The following sankey diagram shows a summary of the usage, and the cost for the traffic originated from source (“From”) to the target (“To”) location, i.e., AWS regions and Internet.
Figure 1. Cost analysis
Figure 2. Traffic Analysis
Both the Internet and Regional views provide analyses of cost and usage for resources and operations for all of the AWS accounts and regions that you use. Furthermore, you could drill down and drill up to a specific resource and its operation.
Figure 3. Resource Analysis
Figure 4. Operations Analysis
Both of the charts above are connected to each other. When you select the top resource, it will show you the corresponding operations that the resource is performing as well as its associated cost. And if you select the specific operation, then it will show you how many resources on the left side are performing that operation along with the cost. For example, in the above “Resource Analysis” diagram, one of the top resources is performing InterZone-in or InterZone-out operations. Use the data pulled from the CUR (either from Athena or QuickSight) and the definitions explained in this post to further dive into using VPC Flow Logs analysis from this post. This will let you narrow down which VPC, Subnet, AZ and specific IP address is coming on the path and causing data transfer charges.
As the diagram above showed, InterZone-In/Out are the top contributors to the data transfer cost. The following table will show you what InterZone-In/Out means. You can co-relate this flow with the pattern explained in this post to see what corrections can be made to your deployment architecture to avoid data transfer charges.
These are examples of the usage_type and operation combinations regardless of charge.
Usage Type |
Operation Type |
Variant |
Direction |
|
1 |
DataTransfer-Regional-Bytes |
InterZone-In |
Cross AZ (private IP) |
In |
2 |
DataTransfer-Regional-Bytes |
InterZone-Out |
Cross AZ (private IP) |
Out |
3 |
DataTransfer-Regional-Bytes |
PublicIP-In |
Same or Different AZ (public IP) |
In |
4 |
DataTransfer-Regional-Bytes |
PublicIP-Out |
Same or Different AZ (public IP) |
Out |
5 |
DataTransfer-Regional-Bytes |
LoadBalancing–PublicIP-In |
Same or Different AZ (ELB public IP) |
In |
6 |
DataTransfer-Regional-Bytes |
LoadBalancing–PublicIP-Out |
Same or Different AZ (ELB public IP) |
Out |
7 |
DataTransfer-Regional-Bytes |
VPCPeering-In |
VPC Peering |
In |
8 |
DataTransfer-Regional-Bytes |
VPCPeering-Out |
VPC Peering |
Out |
9 |
{SRC}-DataTransfer-Regional-Bytes |
PublicIP-In |
Same or Different AZ (public IP) |
In |
10 |
{SRC}-DataTransfer-Regional-Bytes |
PublicIP-Out |
Same or Different AZ (public IP) |
Out |
As AWS services usage increases, monitoring data transfer costs becomes an ongoing activity, and this dashboard becomes a useful tool for visualizing services, resources, and operations that are sources of data transfer usage. Furthermore, you can add more visualizations that would provide insights based on cost allocation tags in your AWS accounts. Tags are a great way to organize AWS resources in the AWS Management Console . The CUR supports the ability to break down AWS costs by tag. Typically, customers use business tags, such as cost center, business unit, or project, to associate AWS costs with traditional financial reporting dimensions within their organization. This lets customers easily associate costs with technical or security dimensions, such as specific applications, environments, or compliance. For more information, see the white-paper AWS Tagging Strategies: Implement an Effective AWS Resource Tagging Strategy. You must modify the Athena view query to add those fields, so that they will be available in the QuickSight dataset to be consumed by these analyses.
The following Data Transfer dashboard will provide a full in-depth view into your data transfer usage and cost.
Figure 5. Data transfer summary dashboard
Conclusion
We have shown how you can run simple Athena sql queries to fetch results for a specific scenario. Moreover, we showed how you can use the Data Transfer dashboard from WA Labs to dive into visualizing data transfer usage patterns and cost drivers, reviewing monthly trends, and identifying your Internet and Regional data transfer usage independently. Drill down to the resource level and its operations to understand the major contributors to data transfer costs. Refer to the VPC flow logs Lab to analyze subnet/IP-specific traffic and further understand traffic to certain IPs that can also be queried via Athena.