How do I resolve the error "GENERIC_INTERNAL_ERROR" when I query a table in Amazon Athena?

7 minute read
1

When I query my Amazon Athena table, I receive the error "GENERIC_INTERNAL_ERROR".

Short description

The different types of GENERIC_INTERNAL_ERROR exceptions and their causes are the following:

  • GENERIC_INTERNAL_ERROR: null: You might see this exception under either of the following conditions:
  • You have a schema mismatch between the data type of a column in table definition and the actual data type of the dataset.
  • You're running a CREATE TABLE AS SELECT (CTAS) query with inaccurate syntax.
  • GENERIC_INTERNAL_ERROR: parent builder is null: You might see this exception when you query a table with columns of data type array, and the SerDe format OpenCSVSerDe. OpenCSVSerde format doesn't support the array data type.
  • GENERIC_INTERNAL_ERROR: Value exceeds MAX_INT: You might see this exception when the source data column is defined with the data type INT and has a numeric value greater than 2,147,483,647.
  • GENERIC_INTERNAL_ERROR: Value exceeds MAX_BYTE: You might see this exception when the source data column has a numeric value exceeding the allowable size for the data type BYTE. The data type BYTE is equivalent to TINYINT. TINYINT is an 8-bit signed INTEGER in two’s complement format with a minimum value of -128 and a maximum value of 127.
  • GENERIC_INTERNAL_ERROR: Number of partition values does not match number of filters: You might see this exception if you have inconsistent partitions on Amazon Simple Storage Service (Amazon S3) data. You might have inconsistent partitions under either of the following conditions:
  • Partitions on Amazon S3 have changed (example: new partitions added).
  • Number of partition columns in the table do not match that in the partition metadata.
  • GENERIC_INTERNAL_ERROR: Multiple entries with same key: You might see this exception due to Keys (columns) in the JSON data when:
  • The same name is used twice.
  • The same name is used when it’s converted to all lowercase.

Resolution

GENERIC_INTERNAL_ERROR:null

Column data type mismatch: Be sure that the column data type in the table definition is compatible with the column data type in the source data. Athena uses schema-on-read technology. This means that your table definitions are applied to your data in Amazon S3 when the queries are processed.

For example, when a table created on Parquet files:

  • Athena reads the schema from the files
  • Then Athena validates the schema against the table definition where the Parquet file is queried.

If the underlying data type of a column doesn't match the data type mentioned during table definition, then the Column data type mismatch error is shown.

To resolve this issue, verify that the source data files aren't corrupted. If there is a schema mismatch between the source data files and table definition, then do either of the following:

  • Update the schema using the AWS Glue Data Catalog.
  • Create a new table using the updated table definition.

If the source data files are corrupted, delete the files, and then query the table.

Inaccurate syntax: You might get the "GENERIC INTERNAL ERROR:null" error when both of the following conditions are true:

To avoid this error, you must use different column names for partitioned_by and bucketed_by properties when you use the CTAS query. To resolve this error, create a new table by choosing different column names for partitioned_by and bucketed_by properties.

GENERIC_INTERNAL_ERROR: parent builder is null

To resolve this error, find the column with the data type array, and then change the data type of this column to string. To change the column data type to string, do either of the following:

  • Update the schema in the Data Catalog.
  • Create a new table by choosing the column data type as string.

Run the SHOW CREATE TABLE command to generate the query that created the table. Then view the column data type for all columns from the output of this command. Find the column with the data type array, and then change the data type of this column to string.

To update the schema of the table with Data Catalog, do the following:

  1. Open the AWS Glue console.
  2. On the navigation pane, choose Tables.
  3. Select the table that you want to update.
  4. Choose Action, and then choose View details.
  5. Choose Edit schema.
  6. Scroll to the column with data type array, and then choose array.
  7. For Column type, select string from the dropdown list.
  8. Choose Update.
  9. On the Edit schema page, choose Save.

GENERIC_INTERNAL_ERROR: Value exceeds MAX_INT

To resolve this error, find the column with the data type int, and then update the data type of this column from int to bigint. To change the column data type, update the schema in the Data Catalog or create a new table with the updated schema.

Run the SHOW CREATE TABLE command to generate the query that created the table. Then view the column data type for all columns from the output of this command. Find the column with the data type int, and then change the data type of this column to bigint.

To update the schema of the table with Data Catalog, do the following:

  1. Open the AWS Glue console.
  2. On the navigation pane, choose Tables.
  3. Select the table that you want to update.
  4. Choose Action, and then choose View details.
  5. Choose Edit schema.
  6. Scroll to the column with data type int, and then choose int.
  7. For Column type, select bigint from the dropdown list.
  8. Choose Update.
  9. On the Edit schema page, choose Save.

GENERIC_INTERNAL_ERROR: Value exceeds MAX_BYTE

To resolve this error, find the column with the data type tinyint. Then, change the data type of this column to smallint, int, or bigint. Or, you can resolve this error by creating a new table with the updated schema.

Run the SHOW CREATE TABLE command to generate the query that created the table. Then, view the column data type for all columns from the output of this command. Find the column with the data type tinyint, and change the data type of this column to smallint, bigint, or int.

To update the schema of the table with Data Catalog, do the following:

  1. Open the AWS Glue console.
  2. In the navigation pane, choose Tables.
  3. Select the table that you want to update.
  4. Choose Action, and then choose View details.
  5. Choose Edit schema.
  6. Scroll to the column with data type tinyint, and then choose tinyinit.
  7. For Column type, select smallint, bigint, or int from the dropdown list.
  8. Choose Update.
  9. On the Edit schema page, choose Save.

GENERIC_INTERNAL_ERROR: Number of partition values does not match number of filters

To resolve this error, do either of the following:

ALTER TABLE doc_example_table DROP PARTITION (date = '2014-05-14');
    
ALTER TABLE doc_example_table ADD PARTITION (date = '2016-05-14', country = 'IN');

GENERIC_INTERNAL_ERROR: Multiple entries with same key

If rows have multiple columns with the same key, pre-processing the data is required to include a valid key-value pair. If only some of the records have duplicate keys, and if you want to ignore these records, set ignore.malformed.json as SERDEPROPERTIES in org.openx.data.jsonserde.JsonSerDe.

If the key names are same but in different cases (for example: “Column”, “column”), you must use mapping. This is because hive doesn’t support case sensitive columns. To do this, you must configure SerDe to ignore casing.

Do the following:

CREATE TABLE mytable (  
  time1 string, 
  time2 string)
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
"case.insensitive" = "false", --tells hive to ignore key case
"mapping.time1"= "time", -- lowercase 'time' mapped into 'time1'
"mapping.time2"= "Time") -- uppercase to 'time2'

Related information

Data types in Amazon Athena

Partitioning data in Athena

AWS OFFICIAL
AWS OFFICIALUpdated a year ago