How do I resolve "HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Duplicate key" when reading files from AWS Config in Athena?

3 minute read

When I use Amazon Athena to query AWS Config files, I get the following error: "Error: HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Duplicate key."

Short description

This error commonly occurs during the following situations:

AWS Config resources have multiple tags with the same name.
Some of the tags are in uppercase and some are in lowercase.

For example, the following record uses tc:Name and tc:name JSON keys:

{
  "fileVersion": "1.0",
  "configSnapshotId": "35eced35-a13a-45b7-81e4-446e35616e70",
  "configurationItems": [
  {
    "tags": { "tc:Name": "6", "tc:name": "abc6-38" }
  },
  {
    "tags": { "tc:Name": "6", "tc:name": "abc6-38" }
  },
  {
    "tags": { "tc:Name": "6" }
  },
   {
    "tags": { "tc:name": "6" }
  }
  ]
}

Resolution

Run a CREATE TABLE statement similar to the below example. This statement creates an Athena table, sets case.insensitive to false, and maps column names to JSON keys that aren't identical to the column names. Before running this statement, note the following:

In the LOCATION field, replace s3://awsexamplebucket/AWSLogs/ with the name of your Amazon Simple Storage Service (Amazon S3) bucket.
Replace all mapping properties with your column names and JSON keys (for example, mapping.fileversion'='fileVersion').

CREATE EXTERNAL TABLE aws_config_configuration_snapshot (
 fileversion STRING,
 configsnapshotid STRING,
 configurationitems ARRAY < STRUCT <
    configurationItemVersion : STRING,
    configurationItemCaptureTime : STRING,
    configurationStateId : BIGINT,
    awsAccountId : STRING,
    configurationItemStatus : STRING,
    resourceType : STRING,
    resourceId : STRING,
    resourceName : STRING,
    ARN : STRING,
    awsRegion : STRING,
    availabilityZone : STRING,
    configurationStateMd5Hash : STRING,
    configuration : STRING,
    supplementaryConfiguration : MAP < STRING, STRING >,
    tags: MAP < STRING, STRING >,
    resourceCreationTime : STRING > >
) 
PARTITIONED BY ( dt STRING , region STRING )
ROW FORMAT SERDE 
 'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'case.insensitive'='false,
  'mapping.fileversion'='fileVersion',
  'mapping.configsnapshotid'='configSnapshotId',
  'mapping.configurationitems'='configurationItems',
  'mapping.configurationitemversion'='configurationItemVersion',
  'mapping.configurationitemcapturetime'='configurationItemCaptureTime',
  'mapping.configurationstateid'='configurationStateId',
  'mapping.awsaccountid'='awsAccountId',
  'mapping.configurationitemstatus'='configurationItemStatus',
  'mapping.resourcetype'='resourceType',
  'mapping.resourceid'='resourceId',
  'mapping.resourcename'='resourceName',
  'mapping.arn'='ARN',
  'mapping.awsregion'='awsRegion',
  'mapping.availabilityzone'='availabilityZone',
  'mapping.configurationstatemd5hash'='configurationStateMd5Hash',
  'mapping.supplementaryconfiguration'='supplementaryConfiguration',
  'mapping.configurationstateid'='configurationStateId'
  )
  LOCATION 's3://awsexamplebucket/AWSLogs/';

If you already have a table with partitions loaded, then you can add the new SerDe properties to the table. Use a statement similar to the following:

ALTER TABLE aws_config_configuration_snapshot SET TBLPROPERTIES (  
'case.insensitive'='false',
'mapping.fileversion'='fileVersion',
'mapping.configsnapshotid'='configSnapshotId',
'mapping.configurationitems'='configurationItems',
'mapping.configurationitemversion'='configurationItemVersion',
'mapping.configurationitemcapturetime'='configurationItemCaptureTime',
'mapping.configurationstateid'='configurationStateId',
'mapping.awsaccountid'='awsAccountId',
'mapping.configurationitemstatus'='configurationItemStatus',
'mapping.resourcetype'='resourceType',
'mapping.resourceid'='resourceId',
'mapping.resourcename'='resourceName',
'mapping.arn'='ARN',
'mapping.awsregion'='awsRegion',
'mapping.availabilityzone'='availabilityZone',
'mapping.configurationstatemd5hash'='configurationStateMd5Hash',
'mapping.supplementaryconfiguration'='supplementaryConfiguration',
'mapping.configurationstateid'='configurationStateId')

When the table is ready, access the tags using configurationItem.tags['TAGNAME']. For example, to access the tc:Name tag, run the following query:

SELECT configurationItem.tags['tc:Name']
FROM your_table
CROSS JOIN unnest(configurationItems) AS t(configurationItem)
WHERE configurationItem.tags['tc:Name'] IS NOT NULL

Related information

Create tables in Amazon Athena from nested JSON and mappings using JSONSerDe

Why do I get errors when I try to read JSON data in Amazon Athena?

Hive-JSON-Serde (Github)

Topics

Analytics

Relevant content

AWS config with Athena - Issue
devops-stchealth
asked 2 years ago
Row is not a valid JSON Object - JSONException: Expected a ',' or '}'
AWS-User-8593327
asked 2 years ago
Duplicate Rows When Loading Delta Native tables In Athena Engine V3
Vlad
asked 10 months ago
Looking to Resolve Error: Resolved credential object is not valid
Neville
asked 8 months ago
Need solution for this Error: Row is not a valid JSON Object - JSONException: Duplicate key "lastupdated"
Aditya_Jagtap
asked a year ago
How do I resolve the "HIVE_CURSOR_ERROR" exception when I query a table in Amazon Athena?
AWS OFFICIALUpdated 3 months ago
Why do I get errors when I try to read JSON data in Amazon Athena?
AWS OFFICIALUpdated 2 months ago
How do I run Athena queries for pretty-print JSON files?
AWS OFFICIALUpdated a year ago
How do I resolve the "Model validation failed (#: extraneous key [Key] is not permitted)" error in CloudFormation?
AWS OFFICIALUpdated 2 years ago
How to use the Python json module when decoding DynamoDB items
EXPERT
Brettski-AWS
published 7 months ago