I get errors when I try to read JSON data in Amazon Athena

Last updated: 2019-05-09

When I try to read JSON data in Amazon Athena, I receive NULL or incorrect data errors. How can I resolve this?

Resolution

Check the following common issues:

Use the correct JSON SerDe

Athena processes JSON data using one of two JSON SerDes:

If you're not sure which SerDe you used, try both of the SerDe versions.

Use one line per record

The following JSON records are formatted correctly:

{ "id" : 50, "name":"John" }
{ "id" : 51, "name":"Jane" }
{ "id" : 53, "name":"Jill" }

The following JSON records are formatted incorrectly:

{
  "id" : 50,
  "name":"John"
},
{
  "id" : 51,
  "name":"Jane"
}
{
  "id" : 53,
  "name":"Jill"
}

These records are also formatted incorrectly:

{ "id" : 50, "name":"John" } { "id" : 51, "name":"Jane" } { "id" : 53, "name":"Jill" }

Use case-insensitive columns or set the case.insensitive property to false

Athena is case-insensitive by default. If you have column names that differ only by case (for example, “Column” and “column”), Athena generates an error, and your data is not visible in Athena. The easiest way to avoid this problem is to generate your data with case-insensitive columns.

If you use the OpenX SerDe, you can use case-sensitive key names. To do this, set the case.insensitive SerDe property to false and add mapping for the uppercase key. For example, to use upper and lowercase columns like this:

{"Username": "bob1234", "username": "bob" }

Use these SerDe properties:

CREATE external TABLE casesensitive_json (user_name String,username String)
ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'mapping.user_name' = 'Username','case.insensitive'='false')
LOCATION 's3://input_path';

For more information about the case.insensitive property, see OpenX JSON SerDe.

Provide an option to ignore malformed records

If you use the OpenX SerDe, you can opt to ignore malformed records, as shown in the following example. When ignore.malformed.json is set to true, malformed records return as NULL.

CREATE EXTERNAL TABLE json (
    a string,
    b int
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://bucket/path/';

After the table is created, run your new query on the new table.


Did this article help you?

Anything we could improve?


Need more help?