Why does Amazon Redshift run the COPY ANALYZE process when STATUPDATE is turned off?
Last updated: 2022-11-21
I tried to use the COPY command to load data into a new table in Amazon Redshift. I even added the COMPUPDATE OFF parameter. Why does COPY ANALYZE still run?
Short description
Before you create a table in Amazon Redshift, select the distribution style of your table. If a table doesn't specify a distribution style, it defaults to AUTO distribution. Amazon Redshift's AUTO distribution feature then uses the COPY ANALYZE process to determine whether the table should use EVEN or ALL distribution.
Note: The COMPUPDATE OFF parameter still runs COPY ANALYZE on tables using an AUTO distribution style. The STATUPDATE OFF parameter does not disable or repress the COPY ANALYZE command.
Resolution
1. Verify the current distribution style of a table by running a SHOW TABLE:
SHOW TABLE [schema_name.]table_name
Note: By default, Amazon Redshift applies AUTO distribution if a distribution style isn't specified in the CREATE TABLE statement.
2. If AUTO distribution is turned on, then create a table with a defined distribution style. Specifying a distribution style prevents COPY ANALYZE from running.
The following example specifies an EVEN distribution style:
create table rs_example_test (id int)diststyle even;
For more information about distribution styles, see Working with data distribution styles.
3. Use the COPY command to load the data into your table, and then append it with the STATUPDATE OFF parameters.
Note: It's a best practice to use the COPY command to load large amounts of data, instead of using individual INSERT statements. This minimizes the risk of performance issues.
4. Run a query on the SVL_STATEMENTTEXT system table, and then filter by the process ID (PID) related to your session:
select userid, xid, pid, trim(label) as label, starttime, endtime, sequence, trim(type) as type, left(text,60) text from svl_statementtext where pid = 26815 order by xid, starttime, sequence;
The syntax above checks whether COPY ANALYZE ran as part of the COPY transaction.
The output will look similar to the following:
userid xid pid label starttime endtime sequence type text
100 78915 26815 default 2019-03-19 14:06:19 2019-03-19 14:06:21 0 QUERY copy rs_test3 from 's3://xxxxx-rs-log/AWSLogs/rs_test2.csv'\
100 78915 26815 default 2019-03-19 14:06:21 2019-03-19 14:06:21 0 UTILITY COMMIT
This confirms that COPY ANALYZE no longer runs as part of the COPY transaction.