How do I resolve the "The provided key element does not match the schema" error when importing DynamoDB tables using Hive on Amazon EMR?

2 minute read
0

When I try to import Amazon DynamoDB tables into Amazon EMR using Hive, I get the error "The provided key element does not match the schema (Service: AmazonDynamoDBv2; Status Code: 400; Error Code."

Resolution

This error usually happens when you have an incorrect schema, corrupt data, or mismatched data. If you still get the error message after ruling out these common causes, then check the Hive application logs. If you turned on logging, then you can find the logs on Amazon Simple Storage Service (Amazon S3) in the location that looks similar to this:

s3://example-log-location/example-cluster-id/node/example-ec2-master-instance-id/applications/hive

Otherwise, you can find the logs in the /mnt/var/log/hive directory on the master node of the EMR cluster. You can connect to the master node, and then check for logs. The logs look similar to the following:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"countryasin":"LOCATION '${INPUT}';","hts_type":null,"hts_code":null}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
... 17 more
Caused by: java.lang.RuntimeException: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The provided key element does not match the schema (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: 0FF3KB36M2SJD8E79BUPOUP943VV4KQNSO5AEMVJF66Q9ASUAAJG)

The row that's mentioned in the error message ({"countryasin":"LOCATION '${INPUT}';","hts_type":null,"hts_code":null}) is part of the Hive script. This Hive script is in the same Amazon Simple Storage Service (Amazon S3) location as the input files. The import job is sending the Hive script to the DynamoDB table as data, as well as using it in the import job. To resolve this problem, move the Hive script to a different Amazon S3 location.


Related information

Optimizing performance for Amazon EMR operations in DynamoDB

DynamoDBMapper class

View log files

AWS OFFICIAL
AWS OFFICIALUpdated a year ago