Why did my query abort in Amazon Redshift?

Last updated: 2020-10-15

My query in Amazon Redshift was aborted with an error message. Why did my query abort?

Short description

A query can abort in Amazon Redshift for the following reasons:

  • Setup of Amazon Redshift workload management (WLM) query monitoring rules
  • Statement timeout value
  • ABORT, CANCEL, or TERMINATE requests
  • Network issues
  • Cluster maintenance upgrades
  • Internal processing errors
  • ASSERT errors

To prevent your query from being aborted, consider the following approaches:

  • Increase your timeout parameter.
  • Update your WLM QMR rules.
  • Schedule long-running operations outside of maintenance windows.

Resolution

Setup of Amazon Redshift WLM query monitoring rules

You can create WLM query monitoring rules (QMRs) to define metrics-based performance boundaries for your queues. You can also specify that actions that Amazon Redshift should take when a query exceeds the WLM time limits. For example, you can create a rule that aborts queries that run for more than a 60-second threshold.

Example 1: "Abort" action specified in the query monitoring rule

If a query is aborted because of the "abort" action specified in a query monitoring rule, the query returns the following error:

ERROR:  Query (500029) cancelled by WLM abort action of Query Monitoring Rule "testrule".

To identify whether a query was aborted because of an "abort" action, run the following query:

select * from STL_WLM_RULE_ACTION where action = 'abort';

The query output lists all queries that are aborted by the "abort" action. If your query ID is listed in the output, increase the time limit in the WLM QMR parameter.

Example 2: No available queues for the query to be hopped

A query can be hopped if the "hop" action is specified in the query monitoring rule. When a query is hopped, WLM tries to route the query to the next matching queue based on the WLM queue assignment rules. If the query doesn't match a queue definition, then the query is canceled. A canceled query isn't reassigned to the default queue. For more information about the WLM timeout behavior, see Properties for the wlm_json_configuration parameter.

Note: You can hop queries only in a manual WLM configuration.

If a query is hopped but no matching queues are available, then the canceled query returns the following error message:

ERROR: Query (500104) canceled on user's request and ran out of wlm queues for restart.

If your query is aborted with this error message, then check the user-defined queues:

select * from stl_wlm_query where query=<query-id>;

In your output, the service_class entries 6-13 include the user-defined queues. For example, service_class 6 might list Queue1 in the WLM configuration, and service_class 7 might list Queue2.

To obtain more information about the service_class to queue mapping, run the following query:

select * from stv_wlm_service_class_config where service_class>5;

After you get the queue mapping information, check the WLM configuration from the Amazon Redshift console. Verify whether the queues match the queues defined in the WLM configuration. A query can be hopped only if there is a matching queue available for the user group or query group configuration. For more information, see WLM query queue hopping.

Statement timeout value

The statement_timeout value is the maximum amount of time that a query can run before Amazon Redshift terminates it. When a statement timeout is exceeded, then queries submitted during the session are aborted with the following error message:

ERROR:  Query (150) cancelled on user's request

To verify whether a query was aborted because of a statement timeout, run following query:

select * from SVL_STATEMENTTEXT where text ilike '%set%statement_timeout%to%' and pid in (select pid from STL_QUERY where query = <queryid>);

Statement timeouts can also be set in the cluster parameter group. Check your cluster parameter group and any statement_timeout configuration settings for additional confirmation. For more information about the cluster parameter group and statement_timeout settings, see Modifying a parameter group.

ABORT, CANCEL, or TERMINATE requests

To check if a particular query was aborted or canceled by a user (such as a superuser), run the following command with your query ID:

select * from SVL_STATEMENTTEXT where text ilike '%cancel%' and xid 
    in (select xid from STL_QUERY where query = <queryid>);
select * from SVL_STATEMENTTEXT where text ilike '%abort%' and xid in (select xid from STL_QUERY where query = <queryid>);
If the query appears in the output, then the query was either aborted or canceled upon user request.

Note: Users can terminate only their own session. A superuser can terminate all sessions.

Queries can also be aborted when a user cancels or terminates a corresponding process (where the query is being run). These are examples of corresponding processes that can cancel or abort a query:

When a process is canceled or terminated by these commands, an entry is logged in SVL_TERMINATE. To confirm whether a query was aborted because a corresponding session was terminated, check the SVL_TERMINATE logs:

select * from SVL_TERMINATE where pid=(select pid from STL_QUERY where query=500534);

Network issues

Sometimes queries are aborted because of underlying network issues. To verify whether network issues are causing your query to abort, check the STL_CONNECTION_LOG entries:

select * from STL_CONNECTION_LOG where pid in (select pid from STL_QUERY where query = <query_id>);
The STL_CONNECTION_LOG records authentication attempts and network connections or disconnections. If your query appears in the output, a network connection issue might be causing your query to abort.

Cluster maintenance upgrades

If a scheduled maintenance occurs while a query is running, then the query is terminated and rolled back, requiring a cluster reboot. Schedule long-running operations (such as large data loads or the VACUUM operation) to avoid maintenance windows. For more information, see Schedule around maintenance windows.

To check if maintenance was performed on your Amazon Redshift cluster, choose the Events tab in your Amazon Redshift console.

Internal processing errors

The STL_ERROR table records internal processing errors generated by Amazon Redshift. The STL_ERROR table doesn't record SQL errors or messages.

To verify whether your query was aborted by an internal error, check the STL_ERROR entries:

select * from STL_ERROR where userid=<user id>;

ASSERT errors

Sometimes queries are aborted because of an ASSERT error. The ASSERT error can occur when there is an issue with the query itself. If you get an ASSERT error after a patch upgrade, update Amazon Redshift to the newest cluster version. Then, check the cluster version history. Or, you can roll back the cluster version.


Did this article help?


Do you need billing or technical support?