My Amazon Redshift query has a WLM timeout set, but it keeps running after this period expires. What's going on?

The WLM timeout only applies to queries during the execution phase. When it appears that WLM has failed to terminate the query within the prescribed time limit, it's usually because the query has spent some time in stages other than the execution stage. For example, the query may wait to be parsed or rewritten, or it might wait on a lock, or wait for a spot in the WLM queue. 

When querying STV_RECENTS, starttime is the time the query entered the cluster, not the time the query began executing. When the query is in the Running state in STV_RECENTS, it is live in the system. To view the status of an executing query, query STV_INFLIGHT instead of STV_RECENTS:

select * from STV_INFLIGHT where query = <queryid>;

Use this query for more information about query stages:

select * from SVL_QUERY_REPORT where query = <queryid> ORDER BY segment, step, slice;

Use this query for the current execution state of the query:

select * from STV_EXEC_STATE where query = <queryid> ORDER BY segment, step, slice;

Here are some common reasons why a query will appear to run longer than the WLM timeout period:

The query is in the "return" phase

There are two "return" steps. Check STV_EXEC_STATE to see if the query has entered one of these return phases:

  • The return to the leader node from the compute nodes
  • The return to the client from the leader node

A rollback is in progress

If a DML operation (insert, update, delete) is performed, but the operation encounters an error and must roll back, the operation will not appear to be killed because it is already in the process of rolling back. You can view rollbacks by querying STV_EXEC_STATE. You can find additional information in STL_UNDONE.

The query spends time queuing prior to execution

You can find queuing time in STV_WLM_QUERY_STATE:

select * from STV_WLM_QUERY_STATE where query = <queryid>;

The query is waiting on a lock

If the query is visible in STV_RECENTS, but not in STV_WLM_QUERY_STATE, it's possible the query is waiting on a lock and hasn't entered the queue yet. Use a query similar to the following to find any locks the stuck process is waiting on:

select * from SVV_TRANSACTIONS

where granted='f'

and pid in (select pid from STV_RECENTS where query = <stuckqueryid>);

Use a query similar to the following to help identify the blocking process:


where b.granted='t'

and w.granted = 'f'

and b.txn_db = w.txn_db

and b.relation = w.relation

and in (select pid from STV_RECENTS where query = <stuckqueryid>);

To find more information about your possible blocking pid (such as the query it's currently running), run queries similar to the following:

select * from STL_QUERY where pid = <pid of process suspected of blocking your stuck query>;

select * from SVL_STATEMENTTEXT where pid = <pid of process suspected of blocking your stuck query>;

To kill the stuck query, or to kill the process that you suspect is blocking it, run a query similar to the following: 

select pg_terminate_backend(pid);

A networking or firewall issue

If an Amazon Redshift server has a problem communicating with your client, it can get stuck in the "return to client" state discussed above. Check for conflicts with any networking components that may block traffic to your server, including inbound on-premises firewall settings, outbound security group rules, or outbound network ACL rules.

An issue with the cluster

Issues on the cluster itself, such as hardware issues, might cause the query to hang, though these are generally rare. Check the status of the cluster in the Amazon Redshift console for the hardware-failure state. If your cluster is in this state, and it is a single-node cluster, recover your cluster from a snapshot. On a multi-node cluster, the failed node should be replaced automatically.

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center.

Published: 2017-04-11