How do I resolve the "The provided key element does not match the schema" error when importing DynamoDB tables using Hive on Amazon EMR?

Last updated: 2020-10-28

When I try to import Amazon DynamoDB tables into Amazon EMR using Hive, I get an error message like this: "The provided key element does not match the schema (Service: AmazonDynamoDBv2; Status Code: 400; Error Code."

Resolution

This error usually happens when you have an incorrect schema, corrupt data, or mismatched data. If you still get the error message after ruling out these common causes, check the Hive application logs. The logs are located in the /mnt/var/log/hive directory on the master node of the EMR cluster. Check for logs like this:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"countryasin":"LOCATION '${INPUT}';","hts_type":null,"hts_code":null}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
... 17 more
Caused by: java.lang.RuntimeException: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The provided key element does not match the schema (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: 0FF3KB36M2SJD8E79BUPOUP943VV4KQNSO5AEMVJF66Q9ASUAAJG)

The row that's mentioned in the error message ({"countryasin":"LOCATION '${INPUT}';","hts_type":null,"hts_code":null}) is part of the Hive script. This Hive script is in the same Amazon Simple Storage Service (Amazon S3) location as the input files. The import job is sending the Hive script to the DynamoDB table as data, as well as using it in the import job. To resolve this problem, move the Hive script to a different Amazon S3 location.