AWS Official Blog

Amazon RDS – Multi-AZ Deployments For Enhanced Availability & Reliability

by Jeff Barr | on | in Amazon RDS | | Comments

Amazon RDS simplifies many of the common tasks associated with the deployment, operation, and scaling of a relational database. You don’t have to worry about acquiring and installing hardware, loading an operating system, installing and configuring MySQL, or managing backups. In addition, scaling the processing power or storage space available to your database is as simple as an API call.

When we rolled out Amazon RDS last October, we also announced plans to have a “High Availability” option in the future. That option is now ready for you to use, and it’s called “Multi-AZ Deployments.” AZ is short for “Availability Zone”; each of the four AWS Regions is comprised or two or more such zones, each with independent power, cooling, and network connectivity.

The availability and reliability characteristics of Multi-AZ deployments make them well suited for critical production environments.  I’d like to tell you about this new feature and how it works; here’s a diagram to get you started:

It is really easy to benefit from the enhanced availability and data durability provided by a DB Instance deployment that spans multiple Availability Zones. All you need to do is supply one additional parameter to the CreateDBInstance function and Amazon RDS will take care of the rest.

To be more specific, when you launch a DB Instance with the Multi-AZ parameter set to true, Amazon RDS will create a primary in one Availability Zone, and a hot standby in a second Availability Zone in the same Region. Data written to the primary will be synchronously replicated to the standby. If the primary fails, the standby becomes the primary and a new standby is created automatically. Amazon RDS automatically detects failure and takes care of all of this for you. The entire failover process takes approximately about three minutes. In addition, existing standard DB Instance deployments can be converted to Multi-AZ deployments by changing the Multi-AZ parameter to true with the ModifyDBInstance function (a hot standby will be created for your current primary).

When automatic failover occurs, your application can remain unaware of what’s happening behind the scenes. The CNAME record for your DB instance will be altered to point to the newly promoted standby. Your MySQL client library should be able to close and reopen the connection in the event of a failover. If your application needs to know that a failover has occurred, you can use the function to check for the appropriate event.

If you have set up an Amazon RDS DB Instance as a Multi-AZ deployment, automated backups are taken from the standby to enhance DB Instance availability (by avoiding I/O suspension on the primary). The standby also plays an important role in patching and DB Instance scaling. In order to minimize downtime during planned maintenance, patches are installed on the standby and then an automatic failover makes the standby into the new primary. Similarly, scaling to a larger DB Instance type takes place on the standby, followed by an automatic failover.

Multi-AZ deployments also offer enhanced data protection and reliability in unlikely failure modes. For example, in the unlikely event a storage volume backing a Multi-AZ DB Instance fails, you are not required to initiate a Point-in-Time restore to the LatestRestorableTime (typically five minutes prior the failure). Instead, Amazon RDS will simply detect that failure and promote the hot standby where all database updates are intact.

Putting it all together, this new feature means that your AWS-powered application can remain running in the face of a disk, DB Instance, or Availability Zone failure. Once again, you can focus on your application and let AWS handle the “dirty work” for you.

While you cannot use the synchronous standby in a Multi-AZ deployment to serve read traffic, we are also working on a Read Replica feature.  This feature will make it easier to take advantage of MySQLs built-in asynchronous replication functionality if you need to scale your read traffic beyond the capacity of a single DB Instance. You’ll be able to provision multiple “Read Replicas” for a given source DB Instance.

— Jeff;