How can I store an Athena query output in a format other than CSV, such as a compressed format?

Last updated: 2020-05-22

I want to store Amazon Athena query results in a format other than CSV, such as JSON or a compressed format. How can I do that?

Resolution

Athena supports CSV output files only. If you want to store query output files in a different format, use a CREATE TABLE AS SELECT (CTAS) query and configure the format property. After the query completes, drop the CTAS table. Keep the following in mind:

  • You can set format to ORC, PARQUET, AVRO, JSON, or TEXTFILE. If you don't specify a format for the CTAS query, Athena uses Parquet by default.
  • The name of the parameter, format, must be listed in lowercase, or your CTAS query fails. For an example, see Example: Writing Query Results to a Different Format.
  • By default, all CTAS queries use GZIP compression. For Parquet and ORC, you can use the parquet_compression and orc_compression options to specify other compression types, such as SNAPPY. For an example, see Example: Specifying Data Storage and Compression Formats.
  • For text-based formats, you can specify a field_delimiter like this: WITH (field_delimiter = ','). If you don't specify a field delimiter, Athena uses \001 by default.
  • If you use the external_location parameter in the CTAS query, be sure that the Amazon Simple Storage Service (Amazon S3) location has no data. Otherwise, you might see an error like this: "HIVE_PATH_ALREADY_EXISTS: Target directory for table 'table_name' already exists: s3://AWSDOC-EXAMPLE-BUCKET". For more information about the external_location parameter, see CTAS Table Properties.

Did this article help you?

Anything we could improve?


Need more help?