Why did my Amazon Aurora PostgreSQL-Compatible cluster failover?

Last updated: 2022-12-16

I want to know what caused my Amazon Aurora PostgreSQL-Compatible Edition DB cluster to failover.

Short description

Aurora PostgreSQL-Compatible automatically performs instance failover to one of its cluster readers in these scenarios:

  • Infrastructure issue with writer instance. This includes loss of network connectivity to the physical host, loss of network connectivity to the cluster's volume, or issues with the physical computing resources.
  • Writer instance is not reachable. This issue is associated with excessive workload, which causes performance bottleneck and resource contention.
  • The writer's DB instance class type is changed as part of DB instance vertical scaling activity.
  • The underlying host of the Aurora writer instance is undergoing software patching, hardware maintenance, or an OS update during a specific maintenance window. For more information, see Maintaining an Amazon Aurora DB cluster.
  • The instance was failed over using the failover option at the instance level.

When the cluster's writer fails to respond to health checks, the cluster starts a failover to one of cluster's readers, based on it's assigned priority. For more information, see Failover with Amazon Aurora PostgreSQL.

Resolution

To identify the reason behind the failover, check the following logs and metrics for your Aurora PostgreSQL-Compatible cluster.

Amazon RDS events

To identify the root cause of an unplanned outage, view all of the Amazon Relational Database Service (Amazon RDS) events from the failover period. All the events are registered in UTC/GMT time by default. If you want to store events for a longer period of time, then send the Amazon RDS events to Amazon CloudWatch Events. For more information, see Creating a rule that triggers on an Amazon Aurora event.

Amazon CloudWatch metrics

View the Amazon CloudWatch metrics for your Aurora PostgreSQL-Compatible cluster to check if high database load caused the outage. For more information, see Monitoring Amazon Aurora metrics with Amazon CloudWatch. Check for spikes in the following key metrics that indicate the availability and health status of your cluster/instance:

  • DatabaseConnections
  • CPUUtilization
  • FreeableMemory
  • DiskQueueDepth

Enhanced Monitoring

To turn on Enhanced Monitoring for your Amazon Aurora instances, see Setting up and turning on Enhanced Monitoring.

Amazon RDS provides metrics in real time for the OS that your DB instance runs on. You can view all of the system metrics and process information for your PostgreSQL instances using the console. You can manage which metrics you want to monitor for each instance, and then customize the dashboard according to your requirements. For descriptions of the Enhanced Monitoring metrics, see OS metrics in Enhanced Monitoring.

Performance Insights

Performance Insights expands on existing Amazon Aurora monitoring features to illustrate and help you analyze your cluster performance. Using the Performance Insights dashboard, you can visualize the database load on your Aurora PostgreSQL-Compatible cluster load. You can filter the load by waits, SQL statements, hosts, or users.

For more information, see Monitoring DB load with Performance Insights on Amazon Aurora and Analyzing metrics with the Performance Insights dashboard.

Aurora database logs

In on-premises databases, the DB logs reside on the file system. Amazon RDS and Amazon Aurora don't give host access to the DB logs on the file system of your Aurora PostgreSQL-Compatible clusters. You can use Amazon CloudWatch Logs to analyze the log data. For more information, see Publishing Aurora PostgreSQL logs to Amazon CloudWatch Logs.

You can also watch a log file by using the AWS Management Console. For more information, see Watching a database log file.

Fast failover with Amazon Aurora PostgreSQL-Compatible

To make sure that failover happens as quickly as possible in your DB clusters, see Fast failover with Amazon Aurora PostgreSQL.

Fast recovery after failover with cluster cache management for Aurora PostgreSQL-Compatible

To make sure that your writer DB instance has fast recovery after a failover, see Fast recovery after failover with cluster cache management for Aurora PostgreSQL.