Why does Amazon Redshift run the COPY ANALYZE process when STATUPDATE is turned off?

Last updated: 2020-03-25

I tried to use the COPY command to load data into a new table in Amazon Redshift. I even added the COMPUPDATE OFF parameter. Why does COPY ANALYZE still run?

Short Description

Before you create a table in Amazon Redshift, select the distribution style of your table. If a table doesn't specify a distribution style, it defaults to AUTO distribution. Amazon Redshift's AUTO distribution feature then uses the COPY ANALYZE process to determine whether the table should use EVEN or ALL distribution.

Note: The COMPUPDATE OFF parameter still runs COPY ANALYZE on tables using an AUTO distribution style. The STATUPDATE OFF parameter does not disable or repress the COPY ANALYZE command.

Resolution

1.    Verify the current distribution style of a table by running a query on the PG_CLASS_INFO view:

select relname, releffectivediststyle from PG_CLASS_INFo where relname = 'rs_example_test'

2.    Check to see if your distribution style is set to AUTO distribution:

select relname, releffectivediststyle from PG_CLASS_INFo where relname = 'rs_test2'
relname | releffectivediststyle
rs_test2 | 10

Note: If the releffectivediststyle value is 10 or 11, then the table was created without specifying a distribution style. Amazon Redshift then assigns the distribution style to AUTO distribution. By default, Amazon Redshift applies AUTO distribution if a distribution style isn't specified in the CREATE TABLE statement.

3.    If AUTO distribution is enabled, then create a table with a defined distribution style. Specifying a distribution style prevents COPY ANALYZE from running.

The following example specifies an EVEN distribution style:

create table rs_example_test (id int)diststyle even;

For more information about distribution styles, see Choosing a Data Distribution Style.

4.    Use the COPY command to load the data into your table, and then append it with the STATUPDATE OFF parameters.

Note: It's a best practice to use the COPY command to load large amounts of data, instead of using individual INSERT statements. This minimizes the risk of performance issues.

5.    Run a query on the SVL_STATEMENTTEXT system table, and then filter by the process ID (PID) related to your session:

select userid, xid, pid, trim(label) as label, starttime, endtime, sequence, trim(type) as type, left(text,60) text from svl_statementtext where pid = 26815 order by xid, starttime, sequence;

The syntax above checks whether COPY ANALYZE ran as part of the COPY transaction.

The output should look like the following:

userid  xid	    pid	    label	  starttime	            endtime	           sequence	 type	  text
100	   78915	26815   default	  2019-03-19  14:06:19	2019-03-19 14:06:21	  0	     QUERY	  copy rs_test3 from 's3://xxxxx-rs-log/AWSLogs/rs_test2.csv'\
100	   78915	26815   default	  2019-03-19  14:06:21	2019-03-19 14:06:21	  0	     UTILITY  COMMIT

This confirms that COPY ANALYZE no longer runs as part of the COPY transaction.


Did this article help you?

Anything we could improve?


Need more help?