General

Q: What is Amazon Athena?
Athena is an interactive analytics service that makes it easier to analyze data in Amazon Simple Storage Service (S3) using Python or standard SQL. Athena is serverless, so there is no infrastructure to set up or manage, and you can start analyzing data immediately. You don’t even need to load your data into Athena; it works directly with data stored in Amazon S3. Amazon Athena for SQL uses Presto with full standard SQL support and works with various standard data formats, including CSV, JSON, Apache ORC, Apache Parquet, and Apache Avro. Athena for Apache Spark supports Python and allows you to use Apache Spark, an open-source, distributed processing system used for big data workloads. To get started, log in to the Athena Management Console and start interacting with your data using the query editor or notebooks.
 
Q: What can I do with Athena?
With Athena, you can analyze data stored in S3. You can use Athena to run interactive analytics using ANSI SQL or Python without the need to aggregate or load the data into Athena. Athena can process unstructured, semi-structured, and structured datasets. Examples include CSV, JSON, Avro, or columnar data formats like Parquet and ORC. Amazon Athena for SQL integrates with Amazon QuickSight for visualizing your data or creating dashboards. You can also use Athena to generate reports or explore data with business intelligence tools or SQL clients, connected with an ODBC or JDBC driver. 
 
Q: How do I get started with Athena?
To get started with Athena, log in to the AWS Management Console for Athena and create your schema by writing Data Definition Language (DDL) statements on the console or by using a create table wizard. You can then start querying data using a built-in query editor. Athena queries data directly from S3, so there’s no loading required.

Amazon Athena for SQL

Q: How do you access Athena?
Amazon Athena for SQL can be accessed through the AWS Management Console, an API, or an ODBC or JDBC driver. You can programmatically run queries, add tables, or partitions using the ODBC or JDBC driver.
 
Q: What are the service limits associated with Athena?
To learn more about service limits, review the Amazon Athena User Guide: Service Quotas.
 
Q: What is the underlying technology behind Athena for SQL?
Athena for SQL uses Presto with full standard SQL support and works with various standard data formats, including CSV, JSON, ORC, Avro, and Parquet. Athena can handle complex analysis, including large joins, window functions, and arrays. Because Athena uses S3 as the underlying data store, it is highly available and durable with data redundantly stored across multiple facilities and multiple devices in each facility. To learn more about Presto, review the Introduction to Presto webpage. 
 
Q: How does Athena for SQL store table definitions and schema?
Athena for SQL uses a managed AWS Glue Data Catalog to store information and schemas about the databases and tables that you create for your data stored in S3. In Regions where AWS Glue is available, you can upgrade to using the Data Catalog with Athena. In Regions where AWS Glue is not available, Athena uses an internal catalog.
 
You can modify the catalog using DDL statements or through the AWS Management Console. Any schemas that you define are automatically saved unless you explicitly delete them. Athena uses schema-on-read technology, which means that your table definitions are applied to your data in S3 when queries are being applied. There’s no data loading or transformation required. You can delete table definitions and schema without impacting the underlying data stored in S3.
 
Q: Why should I upgrade to Data Catalog?
AWS Glue is a fully managed extract, transform, and load (ETL) service. AWS Glue has three main components: 1) a crawler that automatically scans your data sources, identifies data formats, and infers schemas, 2) a fully managed ETL service that allows you to transform and move data to various destinations, and 3) a Data Catalog that stores metadata information about databases and tables either stored in S3 or an ODBC- or JDBC-compliant data store. To use the benefits of AWS Glue, you must upgrade from using Athena’s internal Data Catalog to the Glue Data Catalog.
 
Benefits of upgrading to the Data Catalog include the following:
  • Unified metadata repository: AWS Glue is integrated across various AWS services. AWS Glue supports data stored in Amazon Aurora, Amazon Relational Database Service (RDS) for MySQL, Amazon RDS for PostreSQL, Amazon Redshift, and S3, as well as MySQL and PostgreSQL databases in your Amazon Virtual Private Cloud (VPC) running on Amazon Elastic Compute Cloud (EC2). AWS Glue provides out-of-the-box integration with Athena, Amazon EMR, Amazon Redshift Spectrum, and any application compatible with Apache Hive metastore.
  • Automatic schema and partition recognition: AWS Glue automatically crawls your data sources, identifies data formats, and suggests schemas and transformations. Crawlers can help automate table creation and automatic loading of partitions.
  • Easy-to-build pipelines: The AWS Glue ETL engine generates customizable, reusable, and portable Python code. You can edit the code using your favorite integrated development environment (IDE) or notebook and share it with others using GitHub. When your ETL job is ready, you can schedule it to run on the AWS Glue fully managed, scale-out Spark infrastructure. AWS Glue is serverless, so it handles provisioning, configuration, and scaling of the resources required to run your ETL jobs, allowing you to tightly integrate ETL in your workflow.
To learn more about the Data Catalog, review the AWS Glue webpage.
 
Q: Is there a step-by-step process to upgrade to the Data Catalog?
Yes. For a step-by-step process, review the Amazon Athena User Guide: Integration with AWS Glue.
 
Q: In which Regions is Athena available?
For details of Athena service availability by Region, review the AWS Regional Services List.

When to use Athena versus other big data services

Q: What is the difference between Athena, Amazon EMR, and Amazon Redshift?
Query services like Athena, data warehouses like Amazon Redshift, and sophisticated data processing frameworks like Amazon EMR all address different needs and use cases. You just need to choose the right tool for the job. Amazon Redshift provides the fastest query performance for enterprise reporting and business intelligence workloads, particularly those involving complex SQL with multiple joins and subqueries. Amazon EMR simplifies the process and makes it and cost effective to run highly distributed processing frameworks, such as Apache Hadoop, Spark, and Presto when compared to on-premises deployments. Amazon EMR is flexible—you can run custom applications and code and define specific compute, memory, storage, and application parameters to enhance your analytic requirements. Athena provides a simplified way to run interactive queries for data in S3 without the need to set up or manage any servers.
 
Q: When should I use a full featured enterprise data warehouse, like Amazon Redshift, versus a query service like Athena?
A data warehouse like Amazon Redshift is your best choice when you need to pull together data from many different sources into a common format. These sources can include inventory systems, financial systems, and retail sales systems. A data warehouse like Amazon Redshift is also preferred for long-term storing to build sophisticated business reports from historical data.
 
Data warehouses collect data from across the company and act as the “single source of truth” for report generation and analysis. Data warehouses pull data from many sources, format and organize it, store it, and support complex, high-speed queries that produce business reports. The query engine in Amazon Redshift has been enhanced to perform especially well in this use case, where you must run complex queries that join large numbers of large database tables. TPC-DS is a standard benchmark designed to replicate this use case, and Amazon Redshift runs these queries faster than query services that are enhanced for unstructured data. When you must run queries against highly structured data with lots of joins across lots of large tables, you should choose Amazon Redshift. 
 
By comparison, query services like Athena streamline the running interactive queries against data directly in S3 without worrying about formatting data or managing infrastructure. For example, Athena is great if you just need to run a quick query on some web logs to troubleshoot a performance issue on your site. With query services, you can get started fast. You would define a table for your data and start querying using standard SQL.  
 
You can also use both services together. If you stage your data on S3 before loading it into Amazon Redshift, Athena can also register and query that data. 
 
Q: When should I use Amazon EMR versus Athena?
Amazon EMR goes far beyond just running SQL queries. With Amazon EMR, you can run various scale-out data processing tasks for applications, such as machine learning (ML), graph analytics, data transformation, streaming data, and virtually anything that you can code. Use Amazon EMR if you use custom code to process and analyze large datasets with the latest big data processing frameworks, such as Apache HBase, Spark, Hadoop, or Presto. Amazon EMR gives you full control over the configuration of your clusters and the software installed on them.
 
You should use Athena if you want to run interactive SQL queries against data on S3 without having to manage any infrastructure or clusters. 
 
Q: Can I use Athena to query data that I process using Amazon EMR?
Yes, Athena supports many of the same data formats as Amazon EMR. The Athena Data Catalog is compatible with the Hive metastore. If you're using Amazon EMR and already have a Hive metastore, you run your DDL statements on Athena, and then you can start querying your data right away without impacting your Amazon EMR jobs. 
 
Q: How does federated query in Athena SQL relate to other AWS services?
Federated query in Athena provides you with a unified way to run SQL queries across various relational, nonrelational, and custom data sources. 
 
Q: How does ML in Athena relate to other AWS services? 
Athena SQL queries can invoke ML models deployed on Amazon SageMaker. You can specify the S3 location where they want to store results of these Athena SQL queries. 

Creating tables, data formats, and partitions

Q: How do I create tables and schemas for my data on S3?
Athena uses Apache Hive DDL to define tables. You can run DDL statements using the Athena console, with an ODBC or JDBC driver, through the API, or using the Athena create table wizard. If you use the Data Catalog with Athena, you can also use AWS Glue crawlers to automatically infer schemas and partitions. An AWS Glue crawler connects to a data store, progresses through a prioritized list of classifiers to extract the schema of your data and other statistics, and then populates the Data Catalog with this metadata. Crawlers can run periodically to detect the availability of new data and changes to existing data, including table definition changes. Crawlers automatically add new tables, new partitions to existing table, and new versions of table definitions. You can customize AWS Glue crawlers to classify your own file types. 
 
When you create a new table schema in Athena, the schema is stored in the Data Catalog and used when running queries, but it does not modify your data in S3. Athena uses an approach known as schema-on-read, which allows you to project your schema onto your data when you run a query. This decreases the need for any data loading or transformation. Learn more about creating tables
 
Q: Which data formats does Athena support?
Athena supports various data formats like CSV, TSV, JSON, or Textfiles and also supports open-source columnar formats, such as ORC and Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats. You can improve performance and reduce your costs by compressing, partitioning, and using columnar formats. 
 
Q: Which kinds of data types does Athena support?
Athena supports both simple data types, such as INTEGER, DOUBLE, and VARCHAR, and complex data types, such as MAPS, ARRAY, and STRUCT.   
 
Q: Can I run any Hive Query on Athena?
Athena uses Hive only for DDL and creation/modification and deletion of tables and/or partitions. For a complete list of statements supported, review the Amazon Athena User Guide: DDL statements. Athena uses Presto when you run SQL queries on S3. You can run ANSI-compliant SQL SELECT statements to query your data on S3.
 
Q: What is a SerDe?
SerDe stands for Serializer/Deserializer, which are libraries that tell Hive how to interpret data formats. Hive DDL statements require you to specify a SerDe so that the system knows how to interpret the data that you’re pointing to. Athena uses SerDes to interpret the data read from S3. The concept of SerDes in Athena is the same as the concept used in Hive. Amazon Athena supports the following SerDes:
  • Apache Web Logs: "org.apache.hadoop.hive.serde2.RegexSerDe"
  • CSV: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
  • TSV: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
  • Custom Delimiters: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
  • Parquet: "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
  • Orc: "org.apache.hadoop.hive.ql.io.orc.OrcSerde"
  • JSON: “org.apache.hive.hcatalog.data.JsonSerDe” OR org.openx.data.jsonserde.JsonSerDe
 
Q: Can I add my own SerDe to Athena?
Currently, you cannot add your own SerDe to Athena. We appreciate your feedback, so if there are any SerDes that you would like to see added, contact the Athena team at athena-feedback@amazon.com.
 
Q: If I created Parquet/ORC files using Spark/Hive, will I be able to query them in Athena?
Yes, Parquet and ORC files created with Spark can be read in Athena.
 
Q: If I have data from Amazon Kinesis Data Firehose, how can I query it using Athena?
If your Kinesis Data Firehose data is stored in S3, you can query it using Athena. Create a schema for your data in Athena and start querying. We recommend that you organize the data into partitions to enhance performance. You can add partitions created by Data Firehose using ALTER TABLE DDL statements. Learn more about partitioning data
 
Q: Does Athena support data partitioning?
Yes. You can partition your data on any column with Athena. Partitions allow you to limit the amount of data that each query scans, leading to cost savings and faster performance. You can specify your partitioning scheme using the PARTITIONED BY clause in the CREATE TABLE statement. Learn more about partitioning data
 
Q: How do I add new data to an existing table in Athena?
If your data is partitioned, you will need to run a metadata query (ALTER TABLE ADD PARTITION) to add the partition to Athena after new data becomes available on S3. If your data is not partitioned, adding the new data (or files) to the existing prefix automatically adds the data to Athena. Learn more about partitioning data.
 
Q: If I already have large quantities of log data on S3, can I use Athena to query it?
Yes, Athena streamlines the running of standard SQL queries on your existing log data. Athena queries data directly from S3, so there’s no data movement or loading required. Define your schema using DDL statements and start querying your data right away.

Querying and data formats

Q: Which kinds of queries does Athena support?
Athena supports ANSI SQL queries. Athena uses Presto, an open-source, in-memory, distributed SQL engine, and can handle complex analysis, including large joins, window functions, and arrays. 
 
Q: Can I use QuickSight with Athena?
Yes. Athena integrates with QuickSight, so you can seamlessly visualize your data stored in S3. 
 
Q: Does Athena support other business intelligence (BI) tools and SQL clients?
Yes. Athena comes with an ODBC and JDBC driver that you can use with other BI tools and SQL clients. Learn more about using an ODBC or JDBC driver with Athena. 
 
Q: How do I access the functions supported by Athena?
Learn more about functions supported by Athena. 
 
Q: How do I improve the performance of my query?
You can improve the performance of your query by compressing, partitioning, or converting your data into columnar formats. Athena supports open-source columnar data formats, such as Parquet and ORC. Converting your data into a compressed, columnar format lowers your cost and improves query performance by enabling Athena to scan less data from S3 when running your query.
 
Q: Does Athena support user-defined functions (UDFs)? 
Yes. Athena supports UDFs so that you can write custom scalar functions and invoke them in SQL queries. While Athena provides built-in functions, UDFs help you perform custom processing, such as compressing and decompressing data, redacting sensitive data, or applying customized decryption. 

You can write UDFs in Java using the Athena Query Federation SDK. When a UDF is used in a SQL query submitted to Athena, it is invoked and run on AWS Lambda. UDFs can be used in both SELECT and FILTER clauses of a SQL query. You can invoke multiple UDFs in the same query. 
 
Q: What is the user experience when writing a UDF? 
You can use the Athena Query Federation SDK to write your UDF. Review UDF examples. You can upload your function to Lambda and then invoke it in your Athena query. To get started, refer to the Amazon Athena User Guide: Creating and deploying a UDF using Lambda.
 
Athena will invoke your UDF on a batch of dataset rows to enhance performance. 

Federated query

Q: What is a federated query?
If you have data in sources other than S3, you can use Athena to query the data in place or build pipelines that extract data from multiple data sources and store them on S3. With Athena Federated Query, you can run SQL queries across data stored in relational, nonrelational, object, and custom data sources.

Q: Why should I use federated queries in Athena?
Organizations often store data in a data source that meets the needs of their applications or business processes. These can include relational, key-value, document, in-memory, search, graph, time-series, and ledger databases in addition to storing data in an S3 data lake. Performing analytics on such diverse sources can be complex and time consuming because it typically requires learning new programming languages or database constructs and building complex pipelines to extract, transform, and duplicate data before it can be used for analysis. Athena reduces this complexity by allowing you to run SQL queries on the data where it is. You can use well-known SQL constructs to query data across multiple data sources for quick analysis, or use scheduled SQL queries to extract and transform data from multiple data sources and store them on S3 for further analysis.

Q: Which data sources are supported?
Athena provides built-in connectors to several popular data stores, including Amazon Redshift and Amazon DynamoDB. You can use these connectors to enable SQL analytics use cases on structured, semi-structured, object, graph, time-series, and other data storage types. For a list of supported sources, review the Amazon Athena User Guide: Using Athena Data Source Connectors.

You can also use Athena’s data connector SDK to create a custom data source connector and query it with Athena. Get started by reviewing the documentation and example connector implementation.

Q: Which use cases does federated query enable?
With Athena, you can use your existing SQL knowledge to extract insights from various data sources without learning a new language, developing scripts to extract (and duplicate) data, or managing infrastructure. Using Amazon Athena, you can perform the following tasks:

  • Run on-demand analysis on data spread across multiple data stores using a single tool and SQL dialect.
  • Visualize data in BI applications that push complex, multisource joins down to Athena’s distributed compute engine over ODBC and JDBC interfaces.
  • Design self-service ETL pipelines and event-based data-processing workflows with Athena integration with AWS Step Functions.
  • Unify diverse data sources to produce rich input features for ML model-training workflows.
  • Develop user-facing data-as-a-product applications that surface insights across data mesh architectures.
  • Support analytics use cases while your organization migrates on-premises sources to AWS.

Q: Can I use federated query for ETL?
Athena saves query results to a file on S3. This means you can use Athena to make federated data available to other users and applications. If you want to perform analysis on the data using Athena without repeatedly querying the underlying source, use the Athena CREATE TABLE AS function. You can also use the Athena UNLOAD function to query the data and store the results in a specific file format on S3.

Q: How do data source connectors work?
A data source connector is a piece of code that runs on Lambda that translates between your target data source and Athena. When you use a data source connector to register a data store with Athena, you can run SQL queries on federated data stores. When a query runs on a federated source, Athena calls the Lambda function and tasks it with running the parts of your query that are specific to the federated source. To learn more, review the Amazon Athena User Guide: Using Amazon Athena Federated Query

Machine learning

Q: Which use cases does Athena support for embedded ML?
Athena use cases for ML span different industries, as in the following examples. Financial risk data analysts can run what-if analysis and Monte Carlo simulations. Business analysts might run linear regression or forecasting models to predict future values to help them create richer and forward-looking business dashboards that forecast revenues. Marketing analysts can use k-means clustering models to help determine their different customer segments. Security analysts can use logistic regression models to find anomalies and detect security incidents from logs.

Q: Which ML models can be used with Athena?
Athena can invoke any ML model that is deployed on SageMaker. You have the flexibility to train your own model using your proprietary data, or use a model that is pretrained and deployed on SageMaker. For example, cluster analysis would likely be trained on your own data because you want to categorize new records into the same categories that you used for previous records. Alternatively, for predicting real-world sports events, you could use a publicly available model because the training data used would be in the public domain already. Domain-specific or industry-specific predictions will typically be trained on your own data in SageMaker, while undifferentiated ML needs might use external models.

Q: Can I train my ML model using Athena?
You cannot train and deploy your ML models on SageMaker using Athena. You can train your ML model or use an existing pretrained model that is deployed on SageMaker using Athena. Read the documentation detailing training steps on SageMaker.

Q: Can I run inference on models deployed on other services such as Comprehend, Forecasting, or Models deployed on my own EC2 cluster?
Athena only supports invoking ML models deployed on SageMaker. We welcome feedback on what other services that you want to use with Athena. Email us your feedback to: athena-feedback@amazon.com.

Q: What are the performance implications of using Athena queries for SageMaker inference?
Operational performance improvements are constantly being added to our features and services. To enhance performance of your Athena ML queries, rows are batched when invoking your SageMaker ML model for inference. At this time, user-provided row batch size overrides are not supported.

Q: Which features does Athena ML support?
Athena offers ML inference (prediction) capabilities wrapped by a SQL interface. You can also call an Athena UDF to invoke pre- or post-processing logic on your result set. Inputs can include any column, record, or table, and multiple calls can be batched together for higher scalability. You can run inference in the Select phase or in the Filter phase. To learn more, refer to the Amazon Athena User Guide: Using Machine Learning (ML) with Amazon Athena.

Q: Which ML models can I use?
SageMaker supports various ML algorithms. You can also create your proprietary ML model and deploy it on SageMaker. For example, cluster analysis would likely be trained on your own data because you want to categorize new records into the same categories that you used for previous records. Alternatively, for predicting real-world sports events, you could use a publicly available model because the training data used would be in the public domain.

We expect that domain- or industry-specific predictions will typically be trained on your own data in SageMaker, while undifferentiated ML needs such as machine translation will use external models. 

Security and availability

Q: How do I control access to my data?
Athena allows you to control access to your data by using AWS Identity and Access Management (IAM) policies, access control lists (ACLs), and S3 bucket policies. With IAM policies, you can grant IAM users fine-grained control to your S3 buckets. By controlling access to data on S3, you can restrict users from querying it using Athena.

Q: Can Athena query encrypted data in S3?
Yes, you can query data that’s encrypted using server-side encryption (SSE) with S3-managed encryption keys, SSE with AWS Key Management Service (KMS)–managed keys, and client-side encryption (CSE) with keys managed by AWS KMS. Athena also integrates with AWS KMS and provides you with an option to encrypt your result sets.

Q: Is Athena highly available?
Yes. Athena is highly available and runs queries using compute resources across multiple facilities, automatically routing queries appropriately if a particular facility is unreachable. Athena uses S3 as its underlying data store, making your data highly available and durable. S3 provides durable infrastructure to store important data. Your data is redundantly stored across multiple facilities and multiple devices in each facility.

Q: Can I provide cross-account access to someone else’s S3 bucket?
Yes, you can provide cross-account access to S3.

Pricing and billing

Q: How is Athena priced?
Athena is priced per query and charges based on the amount of data scanned by the query. You can store data in various formats on S3. If you compress your data, partition, or convert it to columnar storage formats, you pay less because you scan less data. Converting data to the columnar format allows Athena to read only the columns that it must process the query. For more details, review the Amazon Athena pricing page.

Q: Why do I get charged less when I use a columnar format?
Athena charges you for the amount of data scanned per query. Compressing your data allows Athena to scan less data. Converting your data to columnar formats allows Athena to selectively read only required columns to process the data. Partitioning your data also allows Athena to restrict the amount of data scanned. This leads to cost savings and improved performance. For more details, review the Amazon Athena pricing page.

Q: How do I lower my costs?
You can save 30% to 90% on your query costs and get better performance by compressing, partitioning, and converting your data into columnar formats. Each of these operations reduces the amount of data that Athena must scan to run a query. Athena supports Parquet and ORC, two of the most popular open-source columnar formats. You can see the amount of data scanned per query on the Athena console.

Q: Does Athena charge me for failed queries?
No, you are not charged for failed queries.

Q: Does Athena charge me for canceled queries?
Yes. If you cancel a query manually, you are charged for the amount of data scanned up to the point at which you canceled the query.

Q: Are there any additional charges associated with Athena?
Athena queries data directly from S3, so your source data is billed at S3 rates. When Athena runs a query, it stores the results in an S3 bucket of your choice. You are then billed at standard S3 rates for these result sets. It is recommended that you monitor these buckets and use lifecycle policies to control how much data gets retained.

Q: Will I be charged for using Data Catalog?
Yes. You are charged separately for using the Data Catalog. To learn more about Data Catalog pricing, review the AWS Glue pricing page. 

Amazon Athena for Apache Spark

Q: What is Amazon Athena for Apache Spark?
Athena is being expanded to support Spark so that data analysts and data engineers can get the interactive, fully managed experience of Athena but can use Spark in addition to SQL. Spark is a popular open-source, distributed processing system that is enhanced for fast analytics workloads against data of any size that offers a rich system of open-source libraries. You can now build Spark applications in expressive languages such as Python using a simplified notebook experience in the Athena console or through Athena APIs. You can query data from various sources, chain together multiple calculations, and visualize the results of their analyses. For interactive Spark applications, you spend less time waiting and are more productive as Athena starts running applications under a second. Also, like with Athena today, customers get a simplified and purpose-built Spark experience that minimizes work required for version upgrades, performance tuning, and integration with other AWS services.

Q: Why should I use Athena for Apache Spark?
Use Athena for Apache Spark when you need an interactive, fully managed analytics experience and a tight integration with AWS services. You can now use Spark to perform analytics in Athena using familiar, expressive languages such as Python and the growing environment of Spark packages. You can also now enter their Spark applications through Athena APIs, or into simplified notebooks in the Athena console, and begin running Spark applications under a second without setting up and tuning the underlying infrastructure. Like the SQL query capabilities of Athena, Athena offers a fully managed Spark experience and handles the performance tuning, machine configurations, and software patching automatically so that you do not need to worry about keeping current with version upgrades. Also, Athena is tightly integrated with other analytics services in the AWS system like Data Catalog. Therefore, you can create Spark applications on data in S3 data lakes by referencing tables from your Data Catalog.

Q: How do I start working with Athena for Apache Spark?
To get started with Athena for Apache Spark, you can start a notebook in the Athena console or start a session using the AWS Command Line Interface (CLI) or Athena API. In your notebook, you can start entering and shutting down Spark applications using Python. Athena also integrates with Data Catalog, so you can work with any data source referenced in the catalog, including data directly in S3 data lakes. Using notebooks, you can now query data from various sources, chain together multiple calculations, and visualize the results of their analyses. On your Spark applications, you can check the execution status and review logs and execution history in the Athena console.

Q: Which Spark version is Athena based on?
Athena for Apache Spark is based on the stable Spark 3.2 release. As a fully managed engine, Athena will provide a custom build of Spark and will handle most Spark version updates automatically in a backward-compatible way without requiring your involvement.

Amazon Athena pricing
Learn more about Amazon Athena pricing

Explore all pricing options offered by Amazon Athena.

Learn more 
Sign up for an AWS account
Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Start building with Amazon Athena
Start building on the console

Get started building with Amazon Athena on the AWS Management Console.

Sign in