How do I resolve the "HIVE_CURSOR_ERROR" exception when I query a table in Amazon Athena?

4 minute read
0

When I run queries on my Amazon Athena table, I get an "HIVE_CURSOR_ERROR" exception.

Resolution

You might get this exception under one of the following conditions:

  • The data objects in your table location are corrupt, aren't valid, or are incorrectly compressed.
  • The records within the table data aren't valid (example: a malformed JSON record).

Common HIVE_CURSOR_ERROR exceptions

You might get one of the following exceptions when there's an issue with these objects in your Amazon Simple Storage Service (Amazon S3) table location:

  • HIVE_CURSOR_ERROR: incorrect header check
  • HIVE_CURSOR_ERROR: invalid block type
  • HIVE_CURSOR_ERROR: incorrect data check
  • HIVE_CURSOR_ERROR: Unexpected end of input stream
  • HIVE_CURSOR_ERROR: invalid stored block lengths
  • HIVE_CURSOR_ERROR: invalid distance code

If you recently added new objects to your Amazon S3 table location, then be sure that these objects are valid and aren't corrupt. Download the objects, and then use your favorite tool to inspect them. For example, decompress the GZIP-compressed objects to check whether the compression is valid.

If your table is partitioned, then check if you are able to query individual partitions. If there are new partitions in the table, then they might contain objects that aren't valid.

Specific HIVE_CURSOR_ERROR exceptions

HIVE_CURSOR_ERROR: Row is not a valid JSON Object

You get this error when a row within the table isn't a valid JSON record.

To resolve this issue, do the following:

  1. Add the property 'ignore.malformed.json' = 'true' to recreate the table.
  2. Run a command to query the new table to identify the files with malformed records similar to the following:
SELECT "$path" FROM example_table WHERE example_column = NULL

For more information, see Why do I get errors when I try to read JSON data in Amazon Athena? and OpenX JSON SerDe.

HIVE_CURSOR_ERROR: Corrupted uncompressed block

You get this error when you use LZO format to compress objects in your table location.

To resolve this error with LZO-compressed data, recreate the table with the INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat". For example:

CREATE EXTERNAL TABLE example_table (    example_column_1 STRING,
    example_column_2 STRING
)
STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://example_bucket/example_prefix/'

You can also use a different compression format to resolve this error.

HIVE_CURSOR_ERROR: Can not read value at 0 in block 0 in file

You get this error when you have Parquet objects that are not valid in the S3 path. Check if your Parquet objects contain DECIMAL data types. If you used Spark to write Parquet objects that contain DECIMAL data types, then they might be incompatible with Hive. To check this condition, read the Parquet objects from a Spark job.

If you can use the Spark job to read the Parquet objects, then rewrite the objects with legacy mode in Spark with the following configuration:

spark.sql.parquet.writeLegacyFormat = true

For information on Spark configuration Options for Parquet, see Configuration on the Spark SQL website.

HIVE_CURSOR_ERROR: org.apache.hadoop.io.ArrayWritable cannot be cast to org.apache.hadoop.io.Text

You get this error if you used an incorrect SerDe for the table definition. For example, the table might use a JSON SerDe, and the source data includes Parquet objects.

To resolve this error, check the source data and confirm that the correct SerDe is used. For more information, see Supported SerDes and data formats.

HIVE_CURSOR_ERROR errors with files read from AWS Config

HIVE_CURSOR_ERROR: java.io.IOException: Start of Object expected

You get this error when the data JsonSerDe isn't able to get the start object of the JSON record. Either the data doesn't match the schema or files are corrupted when you run a query.

To resolve this error, use a different SerDe instead of the default org.apache.hive.hcatalog.data.JsonSerDe such as org.openx.data.jsonserde.JsonSerDe. Then, set the table property to 'ignore.malformed.json' = 'true'.

For more information, see JSON SerDe libraries.

HIVE_CURSOR_ERROR: java.io.IOException: Start token not found where expected.

You get this error when the Athena table location contains files other than the ones delivered by AWS Config. For example, when an Athena query CSV output is set to the same location as the AWS Config files in JSON format.

To resolve this error, move or delete files that aren't generated by AWS Config. For Athena query results, the output files are stored in the following path pattern:

QueryResultsLocationInS3/[QueryName|Unsaved/yyyy/mm/dd/]

HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Duplicate key

You get this error when AWS Config resources have multiple tags with the same name or some are in uppercase and some are in lowercase.

To resolve this error, see How do I resolve "HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Duplicate key" when reading files from AWS Config in Athena?

Related information

Troubleshooting in Athena

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago