With Amazon Athena, you only pay for the queries that you run. You are charged based on the amount of data scanned by each query. You can get significant cost savings and performance gains by compressing, partitioning, or converting your data to a columnar format, because each of those operations reduces the amount of data that Athena needs to scan to execute a query.
Price per query
Save more when you use columnar data formats, partition, and compress your data
You can save from 30% to 90% on your per-query costs and get better performance by compressing, partitioning, and converting your data into columnar formats.
You are charged for the number of bytes scanned by Amazon Athena, rounded up to the nearest megabyte, with a 10MB minimum per query. There are no charges for Data Definition Language (DDL) statements like CREATE/ALTER/DROP TABLE, statements for managing partitions, or failed queries. Cancelled queries are charged based on the amount of data scanned.
Compressing your data allows Athena to scan less data. Converting your data to columnar formats allows Athena to selectively read only required columns to process the data. Athena supports Apache ORC and Apache Parquet. Partitioning your data also allows Athena to restrict the amount of data scanned. This leads to cost savings and improved performance. You can see the amount of data scanned per query on the Athena console. For details, see the pricing example below.
Federated query pricing
You are charged for the number of bytes scanned by Amazon Athena aggregated across all data sources, rounded up to the nearest megabyte, with a 10MB minimum per query.
Amazon Athena queries data directly from Amazon S3. There are no additional storage charges for querying your data with Athena. You are charged standard S3 rates for storage, requests, and data transfer. By default, query results are stored in an S3 bucket of your choice and are also billed at standard Amazon S3 rates.
If you use the AWS Glue Data Catalog with Athena, you are charged standard AWS Glue Data Catalog rates. For details, visit the AWS Glue pricing page.
Additionally, you are charged standard rates for the AWS services that you use with Athena, such as Amazon S3, AWS Lambda, AWS Glue, and Amazon SageMaker. For example, you are charged S3 rates for storage, requests, and inter-region data transfer. By default, query results are stored in an S3 bucket of your choice and are also billed at standard Amazon S3 rates. If you use AWS Lambda, you are charged based on the number of requests for your functions and the duration, the time it takes for your code to execute.
Federated queries invoke AWS Lambda functions in your account, and you are charged for Lambda use at standard rates. Lambda functions invoked by federated queries are subject to Lambda’s free tier. Please visit the AWS Lambda pricing page for details.
Consider a table with 3 equally sized columns, stored as an uncompressed text file with a total size of 3 TB on Amazon S3. Running a query to get data from a single column of the table, requires Amazon Athena to scan the entire file, because text formats can’t be split.
- This query would cost: $15. (Price for 3 TB scanned is 3 * $5/TB = $15)
If you compress your file using GZIP, you might see 3:1 compression gains. In this case, you would have a compressed file with a size of 1 TB. The same query on this file would cost $5. Athena has to scan the entire file again, but because it’s three times smaller in size, you pay one third of what you did before.
If you compress your file and also convert it to a columnar format like Apache Parquet, achieving 3:1 compression, you would still end up with 1 TB of data on Amazon S3. But, in this case, because Parquet is columnar, Amazon Athena can read only the column that is relevant for the query being run. Because the query in question only references a single column, Athena reads only that column and can avoid reading two thirds of the file. Since Athena only reads one third of the file, it scans just 0.33TB of data from S3.
- This query would cost: $1.67. There is a 3x savings from compression and 3x savings for reading only one column.
(File size = 3TB/3 = 1 TB. Data scanned when reading a single column = 1TB/3 = 0.33 TB. Price for 0.33 TB = 0.33 * $5/TB = $1.67)