I used an AWS Glue crawler to create an Amazon Athena table from a CSV file. When I query the first column of the table, I get an error message similar to the following: "SYNTAX_ERROR: line 1:9: Column 'column_id' cannot be resolved."
This error usually occurs when your CSV file is encoded in UTF-8 format with a byte order mark (BOM). AWS Glue doesn't recognize BOMs and changes BOMs to question marks, which Amazon Athena doesn't recognize.
To resolve the syntax error, remove the question mark in Athena or in AWS Glue.
Byte order marks are automatically added to the beginning of the name of the first column. To see the byte order mark, open your file in a hex editor. It will look something like this:
|2||69900001||69924999||AC - Capital||15|
When you run an AWS Glue crawler on the file, the crawler adds a question mark next to the byte order mark. Because the crawler uses the column headers as the column names, the first column in your Athena table might have a name like ?POSTAL_CODE_START. (You can see the question mark when you run SHOW CREATE TABLE.) In Athena, column names do not allow special characters other than underscore (_).
To remove the question mark in Athena
- Copy the crawler DDL statement (SHOW CREATE TABLE table_name).
- Remove the question mark.
- Run the statement.
Note: Be sure to include 'skip.header.line.count'='1' in the table properties.
To remove the question mark in AWS Glue
- Open the AWS Glue console.
- Select your table.
- On the Action drop-down menu, choose View details.
- Choose Edit schema.
- Select the first column and then rename it. You can use the same name for the column, if you prefer.
Note: When you enter the name, AWS Glue removes the question mark, even though you might not see the question mark in the console.
- Choose Save in the top right corner of the page.
To configure AWS Glue to not rebuild the table schema
Before running the crawler again on the same table:
- On the AWS Glue console, choose Crawlers, and then select your crawler.
- Open the Action drop-down menu, and then choose Edit crawler.
- Choose Output.
- Expand Configuration options.
- Select the following options under When the crawler detects schema changes in the data store, how should AWS Glue handle table updates in the data catalog?
Ignore the change and don't update the table in the data catalog.
Update all new and existing partitions with metadata from the table.
- Under How should AWS Glue handle deleted objects in the data store?, select Delete tables and partitions from the data catalog.
- Choose Next, and then choose Finish.