With Amazon Athena, you pay only for what you use. There are no upfront fees, no required minimum commitments, and no long-term contracts. You will be charged at the end of the month for your usage.
To get started, you create a workgroup that will allow you to specify your query engine, your working directory in Amazon Simple Storage Service (S3) to hold the results of your execution, AWS Identity and Access Management (IAM) roles (if needed), and your resource tags. You can use workgroups to separate users, teams, applications, or workloads; set limits on the amount of data that each query or the entire workgroup can process; and track costs. Based on the workgroup that you create, you can either (a) run SQL-based queries and get charged for the number of bytes scanned or (b) run Apache Spark Python code and get charged an hourly rate for executing your code.
Price per query
Save more when you use columnar data formats, partition, and compress your data
You can save from 30% to 90% on your per-query costs and get better performance by compressing, partitioning, and converting your data into columnar formats.
SQL-based queries using a query editor or AWS Command Line Interface (CLI)
You are charged based on the amount of data scanned by each query. You can get significant cost savings and performance gains by compressing, partitioning, or converting your data to a columnar format because each of those operations reduces the amount of data that Athena needs to scan to run a query.
You are charged for the number of bytes that Athena scans, rounded up to the nearest megabyte, with a 10MB minimum per query. There are no charges for Data Definition Language (DDL) statements like CREATE/ALTER/DROP TABLE, statements for managing partitions, or failed queries. Canceled queries are charged based on the amount of data scanned.
Compressing your data allows Athena to scan less data. Converting your data to columnar formats allows Athena to selectively read only required columns to process the data. Athena supports Apache ORC and Apache Parquet. Partitioning your data also allows Athena to restrict the amount of data scanned. This leads to cost savings and improved performance. You can see the amount of data scanned per query on the Athena console. For details, see the pricing example below.
Federated query pricing
You are charged for the number of bytes scanned by Athena, aggregated across all data sources, rounded up to the nearest megabyte, with a 10MB minimum per query.
Run Apache Spark Python code using notebooks or AWS CLI
You only pay for the time that your Apache Spark application takes to run. You are charged an hourly rate based on the number of data processing units (DPUs) used to run your Apache Spark application. A single DPU provides 4 vCPU and 16 GB of memory. You will be billed in increments of 1 second, rounded up to the nearest second.
When you start a Spark session, either by starting a notebook on the Athena console or using the Athena API, two nodes are provisioned for your application: a notebook node that will act as the server for the notebook user interface and a Spark driver node that will coordinate the Spark application and communicate with all the Spark worker nodes. The Spark worker nodes are responsible for running the tasks of the Spark application. After you start a session, you can run a Spark application either by executing cells in your notebook or using the start-calculation-execution API. When you run a Spark application, Athena provisions Spark worker nodes for your application. When your application completes execution, the worker nodes are released. As Athena for Apache Spark instantly scales down resources automatically, Athena minimizes the idle fees that you pay. When the Spark session is finalized, the driver, any worker nodes, and notebook nodes are released. Athena will charge you for driver and worker nodes during the duration of the session. Athena provides notebooks on the console as a user interface for creating, submitting, and executing Spark applications and offers it to you at no additional cost. Athena does not charge for the notebook nodes used during the Spark session.
Athena queries data directly from Amazon S3. There are no additional storage charges for querying your data with Athena. You are charged standard S3 rates for storage, requests, and data transfer. By default, query results are stored in an S3 bucket of your choice and are also billed at standard S3 rates.
If you use the AWS Glue Data Catalog with Athena, you are charged standard Data Catalog rates. For details, visit the AWS Glue pricing page.
Additionally, you are charged standard rates for the AWS services that you use with Athena, such as Amazon S3, AWS Lambda, AWS Glue, and Amazon SageMaker. For example, you are charged S3 rates for storage, requests, and interregion data transfer. By default, query results are stored in an S3 bucket of your choice and are also billed at standard S3 rates. If you use Lambda, you are charged based on the number of requests for your functions and the duration—the time it takes for your code to run.
Federated queries invoke Lambda functions in your account, and you are charged for Lambda use at standard rates. Lambda functions invoked by federated queries are subject to the Lambda Free Tier. For details, visit the Lambda pricing page.
Example 1 – SQL query
Consider a table with 4 equally sized columns, stored as an uncompressed text file with a total size of 3 TB on Amazon S3. Running a query to get data from a single column of the table requires Amazon Athena to scan the entire file because text formats can’t be split.
- This query would cost: $15. (Price for 3 TB scanned is 3 * $5/TB = $15.)
If you compress your file using GZIP, you might see 3:1 compression gains. In this case, you would have a compressed file with a size of 1 TB. The same query on this file would cost $5. Athena has to scan the entire file again, but because it’s three times smaller in size, you pay one-third of what you did before. If you compress your file and also convert it to a columnar format like Apache Parquet, achieving 3:1 compression, you would still end up with 1 TB of data on S3. But, in this case, because Parquet is columnar, Athena can read only the column that is relevant for the query being run. Because the query in question only references a single column, Athena reads only that column and can avoid reading three-fourths of the file. Since Athena only reads one-fourth of the file, it scans just 0.25TB of data from S3.
- This query would cost: $1.25. There is a 3x savings from compression and 4x savings for reading only one column.
(File size = 3TB/3 = 1 TB. Data scanned when reading a single column = 1TB/4 = 0.25 TB. Price for 0.25 TB = 0.25 * $5/TB = $1.25)
Example 2 – Apache Spark application
Consider using a notebook in the Athena console for pulling sales figures for the previous quarter and graphing them to create a report. You start a session using a notebook. Your session lasts for 1 hour and submits 6 calculations as part of the session. Each calculation takes 20 1-DPU worker nodes to run and lasts for 1 minute.
- Worker DPU-hours = Number of calculations * DPUs used per calculations * execution time of calculation = 6 calculations * 20 DPUs per calculation * (1/60) hours per calculation = 2.0 DPU-hours
- Driver DPU-hours = DPUs used per session * session time = 1 DPUs per session * 1 hours per session = 1.0 DPU-hours
- Total DPU-hours = Worker DPU-hours + Driver DPU-hours = 2.0 DPU-hours + 1.0 DPU-hours = 3.0 DPU-hours
- Spark application charges = $0.35 per DPU-hour * 3.0 DPU-hours = $1.05
Note: S3 will charge you separately to store and read your data and the results of your execution.