How do I troubleshoot data loading errors while using the COPY command in Amazon Redshift?
Last updated: 2021-07-01
I tried to use the COPY command to load a flat file. However, I'm experiencing data loading issues or errors in Amazon Redshift. How do I troubleshoot this?
Use the STL_LOAD_ERRORS table to identify any data loading errors that occur during a flat file load. The STL_LOAD_ERRORS table can help you track the progress of a data load, recording any failures or errors along the way. After you troubleshoot the identified issue, reload the data in the flat file while using the COPY command.
Tip: If you're using the COPY command to load a flat file in Parquet format, you can also use the SVL_S3LOG table. The SVL_S3LOG table can be used to identify any data loading errors.
Note: The following steps use an example dataset of cities and venues.
1. Check the data in your sample flat file to confirm that the source data is valid.
7|BMO Field|Toronto|ON|0 16|TD Garden|Boston|MA|0 23|The Palace of Auburn Hills|Auburn Hills|MI|0 28|American Airlines Arena|Miami|FL|0 37|Staples Center|Los Angeles|CA|0 42|FedExForum|Memphis|TN|0 52|PNC Arena|Raleigh|NC ,25 |0 59|Scotiabank Saddledome|Calgary|AB|0 66|SAP Center|San Jose|CA|0 73|Heinz Field|Pittsburgh|PA|65050
In this example demo.txt file, five fields are used, separated by a pipe character. For more information, see Load LISTING from a pipe-delimited file (default delimiter).
2. Open the Amazon Redshift console.
3. Create a sample table using the following DDL:
CREATE TABLE VENUE1( VENUEID SMALLINT, VENUENAME VARCHAR(100), VENUECITY VARCHAR(30), VENUESTATE CHAR(2), VENUESEATS INTEGER ) DISTSTYLE EVEN;
4. Create a view to preview the relevant columns from the STL_LOAD_ERRORS table:
create view loadview as (select distinct tbl, trim(name) as table_name, query, starttime, trim(filename) as input, line_number, colname, err_code, trim(err_reason) as reason from stl_load_errors sl, stv_tbl_perm sp where sl.tbl = sp.id);
This view can help you identify the cause of the data loading error.
5. Use the COPY command to load the data:
copy Demo from 's3://your_S3_bucket/venue/' iam_role 'arn:aws:iam::123456789012:role/redshiftcopyfroms3' delimiter '|' ;
Note: Replace your_S3_bucket with the name of your S3 bucket. Then, replace arn:aws:iam::123456789012:role/redshiftcopyfroms3 with the Amazon Resource Name (ARN) for your AWS Identity and Access Management (IAM) role. This IAM role must be able to access data from your S3 bucket. For more information, see Parameters.
6. Query the load view to display and review the error load details of the table:
testdb=# select * from loadview where table_name='venue1'; tbl | 265190 table_name | venue1 query | 5790 starttime | 2017-07-03 11:54:22.864584 input | s3:// your_S3_bucket/venue/venue_pipe0000_part_00 line_number | 7 colname | venuestate err_code | 1204 reason | Char length exceeds DDL length
In this example, the exception is caused by the length value, which must be added to the venuestate column. The (NC ,25 |) value is longer than the length defined in the VENUESTATE CHAR(2) DDL.
You can resolve this exception in two different ways:
- If the data is expected to exceed the defined length of the column, then review and update the table definition to modify the column length.
- If the data isn't properly formatted or transformed, then modify the data in file to use the correct value.
The output from this query includes the following important information:
- The file causing the error.
- The column causing the error.
- The line number in the input file.
- The reason for the exception.
7. Modify the data in your load file to use the correct values (the length must align with the defined column length):
7|BMO Field|Toronto|ON|0 16|TD Garden|Boston|MA|0 23|The Palace of Auburn Hills|Auburn Hills|MI|0 28|American Airlines Arena|Miami|FL|0 37|Staples Center|Los Angeles|CA|0 42|FedExForum|Memphis|TN|0 52|PNC Arena|Raleigh|NC|0 59|Scotiabank Saddledome|Calgary|AB|0 66|SAP Center|San Jose|CA|0 73|Heinz Field|Pittsburgh|PA|65050
8. Reload the data load:
testdb=# copy Demo from 's3://your_S3_bucket/sales/' iam_role 'arn:aws:iam::123456789012:role/redshiftcopyfroms3' delimiter '|' ; INFO: Load into table 'venue1' completed, 808 record(s) loaded successfully.
Note: The STL_LOAD_ERRORS table can hold only a limited amount of logs (typically for around 4 to 5 days). Also, standard users can view only their own data when querying the STL_LOAD_ERRORS table. To view all the table data, you must be a superuser.