Why do I get zero records when I query my Amazon Athena table?

6 minute read

I ran a CREATE TABLE statement in Amazon Athena with expected columns and their data types. When I run the query SELECT * FROM table-name, the output is "Zero records returned."

Resolution

There are several common reasons why the query might return zero records. Based on your use case, see the related section for common reasons and troubleshooting steps.

AWS Glue partitions

File selected in crawler settings

If you use a crawler, then confirm that the crawler points to the Amazon Simple Storage Service (Amazon S3) bucket instead of a file. Also, make sure that the S3 bucket path doesn't include a trailing slash. For example, use s3://doc-example-bucket/new instead of s3://doc-example-bucket/new/data.json.

Incorrect LOCATION path

Verify the Amazon S3 LOCATION path for the input data. If the input LOCATION path is incorrect, Athena returns zero records. Make sure that the S3 bucket path doesn't include a file, trailing slash, or wild card. For example, use s3://doc-example-bucket/new instead of s3://doc-example-bucket/new/data.json.

Double slash in LOCATION path

Athena doesn't support table location paths that include a double slash (//). For example, the following LOCATION path returns empty results: s3://doc-example-bucket/myprefix//input//

To resolve this issue, copy the files to a location that doesn't have double slashes. Following is an AWS Command Line Interface (AWS CLI) command to copy files:

aws s3 cp s3://doc-example-bucket/myprefix//input// s3://doc-example-bucket/myprefix/input/ --recursive

Note: If you receive errors when you run AWS CLI commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.

Special characters

It's a best practice to include only underscores or capitalization in your table's column names. For example, you can name a column table_name, but not table-name.

Note: AWS Glue and Athena can't read camel case, capital letters, or special characters other than the underscore.

Data for multiple tables stored in the same S3 prefix

AWS Glue crawlers create separate tables for data that's stored in the same S3 prefix. However, when you query those tables in Athena, you get zero records. For example, your Athena query returns zero records if your table location is similar to these examples:

s3://doc-example-bucket/table1.csv
s3://doc-example-bucket/table2.csv

To resolve this issue, create individual S3 prefixes for each table similar to these examples:

s3://doc-example-bucket/table1/table1.csv
s3://doc-example-bucket/table2/table2.csv

Then, run a query similar to this example to update the location for your table table1:

ALTER TABLE table1 SET LOCATION 's3://doc-example-bucket/table1';

Partitions not yet loaded

Athena creates metadata only when a table is created. The data is parsed only when you run the query. If your table has defined partitions, the partitions might not yet be loaded into the AWS Glue Data Catalog or the internal Athena data catalog. Use MSCK REPAIR TABLE or ALTER TABLE ADD PARTITION to load the partition information into the catalog.

If the partitions are stored in a format that Athena supports, then run MSCK REPAIR TABLE to load a partition's metadata into the catalog. For example, if you have a table that's partitioned on Year, Athena expects to find the data at Amazon S3 paths similar to these examples:

s3://doc-example-bucket/athena/inputdata/year=2020/data.csv
s3://doc-example-bucket/athena/inputdata/year=2019/data.csv
s3://doc-example-bucket/athena/inputdata/year=2018/data.csv

If the data is located at the Amazon S3 paths that Athena expects, run a command similar to this example to repair the table:

CREATE EXTERNAL TABLE Employee (    Id INT,
    Name STRING,
    Address STRING
) PARTITIONED BY (year INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://doc-example-bucket/athena/inputdata/';

After the table is created, load the partition information:

MSCK REPAIR TABLE Employee;

After the data is loaded, run the following query again:

SELECT * FROM Employee;

If the partitions aren't stored in a format that Athena supports, or are in different S3 paths, run ALTER TABLE ADD PARTITION for each partition.

For example, suppose that your data is located at these example Amazon S3 paths:

s3://doc-example-bucket/athena/inputdata/2020/data.csv
s3://doc-example-bucket/athena/inputdata/2019/data.csv
s3://doc-example-bucket/athena/inputdata/2018/data.csv

With these paths, run a command similar to this example:

ALTER TABLE Employee ADD    PARTITION (year=2020) LOCATION 's3://doc-example-bucket/athena/inputdata/2020/'
    PARTITION (year=2019) LOCATION 's3://doc-example-bucket/athena/inputdata/2019/'
    PARTITION (year=2018) LOCATION 's3://doc-example-bucket/athena/inputdata/2018/'

After the data is loaded, run the following query again:

SELECT * FROM Employee;

Hive hidden files

Verify that your file names don't start with an underscore (_) or a period (.).

Athena considers these files as placeholders and ignores them when you process a query. For more information, see Athena can't read hidden files. If all the files in your S3 path have names that start with an underscore or a period, then you get zero records.

Note: If your S3 path includes placeholders along with files whose names start with different characters, Athena ignores only the placeholders and queries the other files. Therefore, you might get one or more records.

For example:

s3://doc-example-bucket/athena/inputdata/_file1
s3://doc-example-bucket/athena/inputdata/.file2

Partition Projection

Values not within the range bounds for partition projection

Queries for values that are beyond the range bounds defined for partition projection don't return an error. Instead, the query returns zero rows.

For example, suppose your data starts in 2020 and is defined as: projection.timestamp.range'='2020/01/01,NOW

If you run this example query, the query completes successfully, but returns zero rows:

SELECT * FROM table-name WHERE timestamp = '2019/02/02'

Storage template doesn't follow the default partitioning scheme

If your Amazon S3 file locations don't follow the location pattern .../column=value..., then you must specify a custom Amazon S3 partitioning scheme. If you don't define a custom scheme, then your query returns zero records.

If you use a custom template, the template must contain a placeholder for each partition column. To define a custom S3 partitioning scheme, see Specifying custom S3 storage locations.

Incorrect custom storage template

If you use a custom template, then be sure that the template allows Athena to build your partition locations. Also, make sure that each placeholder and the Amazon S3 path ends with a single forward slash.

For example, suppose you define a partition column year with the DDL statement PARTITIONED BY (year string) and your S3 file locations is s3://doc-example-bucket/athena/inputdata/Year=2022/. This location returns zero records. Update the Amazon S3 storage location to: s3://doc-example-bucket/athena/inputdata/Year=${year}.

Partition properties

If you have a partition columns enum, integer, or date type, make sure you set the partition properties correctly. The configurations must allow Athena to build partition locations that match the structure of your data on Amazon S3.

For example, suppose you have time-related data that comes in on daily basis one hour after midnight in the location: s3://doc-example-bucket/athena/inputdata/2022-01-01-01-00.

In this example, the Athena table uses these partition properties:

'projection.dt.format' = 'yyyy-MM-dd-HH-mm','projection.dt.range' = '2022-01-01-00-00,NOW',
'projection.dt.interval' = '1',
'projection.dt.interval.unit' = 'DAYS'

Queries that run on the example table return zero records. This is because the property's file locations projected correspond to midnight s3://doc-example-bucket/athena/inputdata/2022-01-01-00-00.

To resolve this issue, set the property 'projection.dt.range' to '2022-01-01-01-00,NOW'.