How can I perform a failover in my Aurora global database, and why did my failover fail?

4 minute read
0

I want to know how to perform failover in my Amazon Aurora PostgreSQL-Compatible Edition global database and the reasons why a failover failed.

Resolution

For an Aurora global database, there are two different options for performing a failover:

  • Manual unplanned failover ("detach and promote") - Use this option to recover from an unplanned outage or to do disaster recovery testing.
  • Managed planned failover - Use this option for operational maintenance or other planned operational procedures.

Manual unplanned failover

To fail over to a secondary cluster after an unplanned outage in the primary AWS Region, first do the following:

  1. Stop issuing DML statements and other write operations to the primary Aurora DB cluster in the AWS Region with the outage.
  2. Identify an Aurora DB cluster from a secondary AWS Region to use as a new primary DB cluster. If you have two or more secondary AWS Regions in your Aurora global database, then choose the secondary cluster that has the least lag time.
  3. Remove your chosen secondary DB cluster from the Aurora global database.

Then, to complete the manual unplanned failover, do the following:

  1. Reconfigure your application to send all write operations to the now standalone Aurora DB cluster using its new endpoint. If you accepted the provided names when creating the Aurora global database, then change the endpoint by removing the -ro from the cluster's endpoint string.
    For example, the secondary cluster's endpoint my-global.cluster-ro-aabb.us-west-1.rds.amazonaws.com becomes my-global.cluster-aabb.us-west-1.rds.amazonaws.com when that cluster is detached from the Aurora global database.
    The secondary Aurora DB cluster becomes the primary cluster of a new Aurora global database when you start adding Regions to it.
  2. Add an AWS Region to the DB cluster to start the replication process from primary to secondary cluster.
  3. Add more AWS Regions as needed to recreate the topology needed to support your application.

For more information, see Recovering an Amazon Aurora global database from an unplanned outage.

Managed planned failover

To start a managed planned failover for your Aurora global database, do the following:

  1. Open the Amazon RDS console.
  2. Choose Databases, and find the Aurora global database you want to fail over.
  3. From the Actions menu, choose Fail over global database. The failover is now pending, and the process doesn't begin until after you choose the failover target.
  4. Choose the secondary Aurora DB cluster that you want to promote to primary. The secondary DB cluster must be available.
    Note: If you have more than one secondary DB cluster, then compare the lag amount for all secondaries. Then, choose the one with the smallest amount of lag.
  5. Choose Fail over global database to confirm your choice of secondary DB cluster, and begin the failover process.
    The Status column of the Databases list shows the state of each Aurora DB instance and Aurora DB cluster during the failover process.
    The status bar at the top of the console displays progress and provides a Cancel failover option.
    If you choose Cancel failover, then you're given the option to proceed with the failover or to cancel the failover process.
  6. Choose Close to continue failing over, and dismiss the prompt.

After the failover completes, you can see the Aurora DB clusters and their current state in the Databases list. For more information, see Performing managed planned failovers for Amazon Aurora global databases.

You can also use the AWS Command Line Interface (AWS CLI) to initiate a managed planned failover by running the failover-global-cluster command.

Reasons for a failed failover

A failover might fail due to one of the following reasons:

  • Replication lag between the source and target
  • Availability Zone failures
  • Compute node failures
  • Networking issues between DB instances
  • Storage issues
  • Large Scale Events

Related information

Using failover in an Amazon Aurora global database

AWS OFFICIAL
AWS OFFICIALUpdated a year ago