How can I store an Athena query output in a format other than CSV, such as a compressed format?
Last updated: 2020-05-22
I want to store Amazon Athena query results in a format other than CSV, such as JSON or a compressed format. How can I do that?
Athena supports CSV output files only. If you want to store query output files in a different format, use a CREATE TABLE AS SELECT (CTAS) query and configure the format property. After the query completes, drop the CTAS table. Keep the following in mind:
- You can set format to ORC, PARQUET, AVRO, JSON, or TEXTFILE. If you don't specify a format for the CTAS query, Athena uses Parquet by default.
- The name of the parameter, format, must be listed in lowercase, or your CTAS query fails. For an example, see Example: Writing Query Results to a Different Format.
- By default, all CTAS queries use GZIP compression. For Parquet and ORC, you can use the parquet_compression and orc_compression options to specify other compression types, such as SNAPPY. For an example, see Example: Specifying Data Storage and Compression Formats.
- For text-based formats, you can specify a field_delimiter like this: WITH (field_delimiter = ','). If you don't specify a field delimiter, Athena uses \001 by default.
- If you use the external_location parameter in the CTAS query, be sure that the Amazon Simple Storage Service (Amazon S3) location has no data. Otherwise, you might see an error like this: "HIVE_PATH_ALREADY_EXISTS: Target directory for table 'table_name' already exists: s3://AWSDOC-EXAMPLE-BUCKET". For more information about the external_location parameter, see CTAS Table Properties.