Posted On: Oct 11, 2018
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. Today, we are releasing support for creating tables using the results of a Select query or support for Create Table As Select (CTAS) statement. Analysts can use CTAS statements to create new tables from existing tables on a subset of data, or a subset of columns, with options to convert the data into columnar formats, such as Apache Parquet and Apache ORC, and partition it. Athena automatically adds the resultant table and partitions to the Glue Data Catalog, making them immediately available for subsequent queries. By default, CTAS statements in Athena write data in Parquet format. Other supported formats include Apache ORC, AVRO, JSON, and Text, with options to use Gzip or Snappy as compression formats. You can also bucket your data by columns or choose to encrypt it.
CTAS statements help reduce cost and improve performance by allowing users to run queries on smaller tables constructed from larger tables. For example, you can use a CTAS statement to create a table that selects specific columns from two different tables that have data in JSON format, convert the results into columnar format such as Parquet, and add the table to the Glue Data Catalog in a single statement, making subsequent queries easier, faster and cheaper. With CTAS statements, analysts no longer have to rely on Data Engineering teams to create tables aligned to their specific workloads, enabling a self-service environment. Click here for additional examples of CTAS statements. CTAS statements are charged based on bytes scanned in the Select phase, similar to how Athena charges for Select queries.