My Amazon Athena query fails with the error "HIVE_BAD_DATA: Error parsing field value for field X: For input string: "12312845691""

Last updated: 2020-03-30

When I query data in Amazon Athena, I get an error similar to one of the following:

"HIVE_BAD_DATA: Error parsing field value for field X: For input string: "12312845691"

"HIVE_BAD_DATA: Error parsing column '0': target scale must be larger than source scale"

Short Description

There are several versions of the HIVE_BAD_DATA error. If the error message specifies a null or empty input string (for example, "For input string: """), see When I query CSV data in Athena, I get the error "HIVE_BAD_DATA: Error parsing field value '' for field X: For input string: """.

Errors that specify an input string with a value happen when one of the following is true:

  • The data type defined in the table definition doesn't match the actual source data.
  • A single field contains different types of data (for example, an integer value for one record and a decimal value for another record).

Resolution

It's a best practice to use only one data type in a column. Otherwise, the query might fail. To resolve errors, be sure that each column contains values of the same data type, and that the values are in the allowed ranges.

If you still get errors, change the column's data type to a compatible data type that has a higher range. If you can't solve the problem by changing the data type, try the solutions in the following examples.

Example 1

  • Source format: JSON
  • Issue: In the last record, the "id" key value is "0.54," which is the DECIMAL data type. However, for the other records, the "id" key value is set to INT.

Source data:

{ "id" : 50, "name":"John" }
{ "id" : 51, "name":"Jane" }
{ "id" : 53, "name":"Jill" }
{ "id" : 0.54, "name":"Jill" }

Data Definition Language (DDL) statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data (
    id INT,
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data/';

Data Manipulation Language (DML) statement:

SELECT * 
FROM jsontest_error_hive_bad_data

Error:

Your query has the following error(s):
HIVE_BAD_DATA: Error parsing field value '0.54' for field 0: For input string: "0.54"
This query ran against the "default" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: bd50793b-94fc-42a7-b131-b7c81da273b2.

To resolve this issue, redefine the "id" column as STRING. The STRING data type can correctly represent all values in this dataset. Example:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_correct_id_data_type (
    id STRING,
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data/';

DML statement:

SELECT * 
FROM jsontest_error_hive_bad_data_correct_id_data_type

You can also CAST to the desired data type. For example, you can cast a string as an integer. However, depending on which data types you are casting from and to, this might return null or inaccurate results. Values that can't be cast are discarded. For example, casting the string value "0.54" to INT returns null results:

SELECT TRY_CAST(id AS INTEGER) 
FROM jsontest_error_hive_bad_data_correct_id_data_type

Example output:

Results
     _col0
1    50
2    51
3    53
4

The output shows that the value "0.54" was discarded. You can't cast that value directly from a string to an integer. To resolve this issue, use COALESCE to CAST the mixed type values in the same column as the output. Then, allow the aggregate function to run on the column. Example:

SELECT COALESCE(TRY_CAST(id AS INTEGER), TRY_CAST(id AS DECIMAL(10,2))) 
FROM jsontest_error_hive_bad_data_correct_id_data_type

Output:

Results
 	_col0
1	50.00
2	51.00
3	53.00
4	0.54

Run aggregate functions:

SELECT SUM(COALESCE(TRY_CAST(id AS INTEGER), TRY_CAST(id AS DECIMAL(10,2)))) 
FROM jsontest_error_hive_bad_data_correct_id_data_type

Output:

 	_col0
1	154.54

Example 2

  • Source format: JSON
  • Issue: The "id" column is defined as INT. Athena couldn't parse "49612833315" because the range for INT values in Presto is -2147483648 to 2147483647.

Source data:

{ "id" : 50, "name":"John" }
{ "id" : 51, "name":"Jane" }
{ "id" : 53, "name":"Jill" }
{ "id" : 49612833315, "name":"Jill" }

DDL statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_sample_2 (
    id INT,
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data_2/';

DML statement:

SELECT * 
FROM jsontest_error_hive_bad_data_sample_2

Error:

Your query has the following error(s):
HIVE_BAD_DATA: Error parsing field value '49612833315' for field 0: For input string: "49612833315"
This query ran against the "default" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 05b55fb3-481a-4012-8c0d-c27ef1ee746f.

To resolve this issue, define the "id" column as BIGINT, which can read the value "49612833315." For more information, see Integer Types.

Modified DDL statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_sample_2_corrected (
    id BIGINT,
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data_2/';

Example 3

  • Source format: JSON
  • Issue: The input data is DECIMAL and the column is defined as DECIMAL in the table definition. However, the scale is defined as 2, which doesn't match the "0.000054" value. For more information, see DECIMAL or NUMERIC Type.

Source data:

{ "id" : 0.50, "name":"John" }
{ "id" : 0.51, "name":"Jane" }
{ "id" : 0.53, "name":"Jill" }
{ "id" : 0.000054, "name":"Jill" }

DDL statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_sample_3(
    id DECIMAL(10,2),
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data_3/';

DML statement:

SELECT * 
FROM jsontest_error_hive_bad_data_sample_3

Error:

Your query has the following error(s):
HIVE_BAD_DATA: Error parsing column '0': target scale must be larger than source scale
This query ran against the "default" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 1c3c7278-7012-48bb-8642-983852aff999.

To resolve this issue, redefine the column with a scale that captures all input values. For example, instead of (10,2), use (10,7).

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_sample_3_corrected(
    id DECIMAL(10,7),
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data_3/';

Did this article help you?

Anything we could improve?


Need more help?