AWS Official Blog

Amazon EC2, MySQL, Amazon S3

by Jeff Barr | on | in Thought Pieces | | Comments

I was on a conference call yesterday and the topic of ways to store persistent data when using Amazon EC2 came up a couple of times. It would be really cool to have a persistent instance of a relational database like MySQL but there’s nothing like that around at the moment. An instance can have a copy of MySQL installed and can store as much data as it would like (subject to the 160GB size limit for the virtual disk drive) but there’s no way to ensure that the data is backed up in case the instance terminates without warning.

Or is there?

It is fairly easy to configure multiple instances of MySQL in a number of master-slave, master-master, and other topologies. The master instances produce a transaction log each time a change is made to a database record. The slaves or co-masters keep an open connection to the master, reading the changes as they are logged and mimicing the change on the local copy. There can be some replication delay for various reasons, but the slaves have all of the information needed to maintain exact copies of the database tables on the master.

Put another way, the master essentially implements a simple service API for fetching changes as they occur and the slaves slavishly do those same changes.

Hmmm…services…

What if the slave (client) wasn’t another instance of MySQL? What if it was a very simple application which pulled down the transaction logs and wrote them into Amazon S3 objects on a frequent and regular basis? If  the master were to disappear without warning (I could say crash here, but I won’t), the information needed to restore the database to an earlier state would be safely squirreled away in S3.

For recovery, we need another service. This one pretends to be a master, but it simply pulls out that squirreled-away cache of transactions logs from S3 and feeds them to a MySQL instance which it is temporarily slaved to. After a replay of all of the transactions the slave becomes the master and processing resumes.

Make sense? Could this work? What do you think, Brian?

I’d better run, or I’ll be late for my talk!

— Jeff;