Amazon Web Services Structured Data Storage Options
Amazon Web Services provides a number of options for storing structured data.
- Amazon RDS enables you to run a fully featured relational database while offloading database administration
- Amazon SimpleDB provides simple index and query capabilities with totally seamless scalability
- Using one of our many relational database AMIs on Amazon EC2 and Amazon EBS allows you to operate your own relational database in the cloud.
There are important differences between these alternatives that may make one more appropriate for your use case.
Amazon EC2 - Relational Database AMIs
You can use any of a number of leading relational databases on Amazon EC2. You can use an Amazon EC2 instance to run a database, and store the data within an Amazon Elastic Block Store (Amazon EBS) volume. Amazon EBS is a fast and reliable persistent storage feature of Amazon EC2. By designing, building, and managing your own relational database on Amazon EC2, you avoid the friction of provisioning and scaling your own infrastructure, while gaining access to a variety of standard database engines over which you can exert full administrative control. Available AMIs include IBM DB2, MySQL, Oracle, PostgreSQL, SQL Server, Sybase, and Vertica.
Amazon RDS
If your application requires relational storage, but you want to reduce the time you spend on database management, Amazon RDS automates common administrative tasks to reduce the complexity and total cost of ownership. Amazon RDS automatically backs up your database and maintains your database software, allowing you to spend more time on application development. With the native database access Amazon RDS provides, you get the programmatic familiarity, tooling and application compatibility of a traditional RDBMS. You also benefit from the flexibility of being able to scale the computing resources or storage capacity associated with your relational database instance using a single API call.
With Amazon RDS, you still control the database settings that are specific to your business. This includes building a relational schema to fit your use case, creating indices, and tuning the performance of your database to your application's workflow. You also take an active role in the scaling decisions for your database; you tell the service when you want to add more storage or change to a larger or smaller DB Instance class.
Amazon SimpleDB
For database implementations that do not require a relational model, and that principally demand index and query capabilities, Amazon SimpleDB eliminates the administrative overhead of running a highly-available production database, and is unbound by the strict requirements of a RDBMS. With Amazon SimpleDB, you store and query your data items with simple web service requests, and Amazon SimpleDB does the rest. In addition to handling infrastructure provisioning, software installation and maintenance, Amazon SimpleDB automatically indexes your data, creates geo-redundant replicas of your data to ensure high availability, and performs database tuning on your behalf. Amazon SimpleDB also provides no-touch scaling. That is, there is no need to anticipate and respond to changes in request load or database utilization. The service simply responds to traffic as it comes and goes, charging you only for the resources you consume. Finally, Amazon SimpleDB doesn't enforce a rigid schema for your data. This gives you flexibility; if your business changes, you can easily reflect these changes in Amazon SimpleDB without any schema updates or changes to your database code.
Amazon SimpleDB, however, is not a relational database, and does not offer some features needed in certain applications, such as complex transactions or joins. For this you need an RDBMS.
Functional Overview
Build your own database on Amazon EC2/EBS
You can use an Amazon EC2 instance to run a database, and store the data within an Amazon Elastic Block Store (Amazon EBS) volume. An Amazon Machine Image (AMI) is an encrypted machine image stored in Amazon S3. It contains all the information necessary to boot instances of your software. Many existing AMIs come packaged with relational databases, including:
IBM
- IBM DB2 Express Edition 9.5 32 Bit
- IBM DB2 Workgroup Edition 9.5 64 Bit
- IBM Informix Dynamic Server Express Edition 32 Bit
- IBM Informix Dynamic Server Workgroup Edition 64 Bit
Microsoft SQL Server
- Windows SQL Server Express + IIS + ASP.NET on Windows Server 2003 R2 (32bit)
- Windows SQL Server Express + IIS + ASP.NET on Windows Server 2003 R2 (64bit)
- Windows SQL Server 2005 Standard on Windows Server 2003 R2 (64bit)
MySQL
Oracle
- Oracle Database 11g Release 1 Enterprise Edition 32 Bit
- Oracle Database 11g Release 1 Enterprise Edition 64 Bit
- Oracle Database 11g Release 1 Standard Edition/Standard Edition One 32 Bit
- Oracle Database 10g Release 2 Express Edition 32 Bit
PostgreSQL
Sybase
- Sybase IQ 15.0 on Windows 32-bit Developer's Edition
- Sybase SQL Anywhere 11.0.1 Developer Edition on Windows 64-bit
- Sybase SQL Anywhere 11.0.1 Developer Edition on Windows 32-bit
- Sybase SQL Anywhere 11.0.1 Developer Edition on Fedora 8 32-bit
- Sybase ASE 15.0.3 ESD #1 on Windows 32-bit Developer's Edition
- Sybase Replication Server 15.2 on Windows 32-bit Evaluation Edition
Once you've launched one of these pre-built AMIs (or deployed some other database software on an Amazon EC2 instance), you'll want to create an Amazon Elastic Block Storage (Amazon EBS) volume to persist your structured data. Amazon EBS is storage designed specifically for Amazon EC2 instancesallowing you to create volumes that can be mounted as devices by EC2 instances. Amazon EBS volumes behave as if they were raw unformatted external hard drives. They have user supplied device names and provide a block device interface. You can create up to 20 Amazon EBS volumes of any size (from one gigabyte up to one terabyte); whatever is appropriate for your data set.
Amazon EBS provides the ability to create snapshots of your Amazon EBS volumes to Amazon S3. You can use these snapshots as the starting point for new Amazon EBS volumes and to protect your data for long term durability.
Amazon EBS Volumes
- Persist beyond the lifetime of instances, protecting against data loss in the unlikely event of Amazon EC2 instance failure
- Provide high availability and reliability
- Attach to and detach from a running instanceallowing you, for example, to snapshot your data set, instantiate a new instance, and deploy a test or development database
Amazon EBS Snapshots
- Capture the current state of a volume
- Provide backup protection
- Can be used to instantiate new volumes, which contain the exact data of the snapshot
Amazon RDS
Getting started with Amazon RDS begins with the creation of a database instance (generally referred to as a DB Instance). This DB Instance is a fully functional database that you can access and interact with much like any stand-alone database server. An Amazon RDS DB Instance can contain multiple user-created databases, and can be accessed using the same command line tools and utilities used for stand-alone database servers.
Amazon RDS DB Instances are created using either command line tools or the APIs. Using the rds-create-db-instance command or the CreateDBInstance API, you can create your own Amazon RDS DB Instance by specifying:
- Instance identifiera unique name for your Amazon RDS DB Instance
- Database enginethe underlying relational database engine (MySQL 5.1 currently supported)
- Compute classthe class whose memory and compute power meet your requirements
- Storageamount of storage allocated to the DB Instance (from 5 GB to 1 TB)
- Master useruser with permission to create databases, manage users, etc.
- Master user passwordpassword associated with the master user account
You can check the status of your create instance request with the rds-describe-dbinstances command or the DescribeDBInstances API, and can start using your Amazon RDS DB Instance as soon as the instance status is "available."
Next, to protect against data loss, Amazon RDS enables point-in-time recovery automatically creating a backup of your database. This backup occurs during a daily user-configurable 2-hour period of time known as the backup window. Backups created during this period of time are retained for a user-configurable number of days (the retention period).
You can enable point-in-time recovery for an Amazon RDS DB Instance by setting the backup-retention-period parameter to a non-zero value using the rds-create-db-instance or rds-modify-db-instance commands, or the CreateDBInstance or ModifyDBInstance APIs. When a backup retention period is changed to a non-zero value, the first backup occurs immediately. Changing the backup retention period to 0 turns off automatic point-in-time backups for the instance, and will delete all automated backups for the instance. Turning off automated backups is discouraged.
Finally, if demand for your database grows beyond the capacity of your initial DB Instance, you can scale the computing resources and storage capacity with the ModifyDBInstance API. You can change memory and CPU resources by changing your DB Instance class, and change available storage when you modify your storage allocation. Your requested changes are applied during your specified maintenance window, or you can use the "apply-immediately" flag. Bear in mind that using this flag will apply any other pending system changes as well.
Amazon SimpleDB
Amazon SimpleDB provides a simple web service interface to create and store multiple data sets, query your data easily, and return the results. Your data is automatically indexed, making it easy to quickly find the information that you need. There is no need to pre-define a schema or change a schema if new data is added later. And scaling out is as simple as creating new domains, rather than building out new servers.
The first step in storing data in Amazon SimpleDB is to create one or more domains. Domains are similar to database tables, except that you cannot perform functions across multiple domains, such as querying multiple domains or using foreign keys. You should note, however, that although the Amazon SimpleDB API cannot perform queries across multiple domains, you can design your applications to perform queries across multiple domains. Regardless, you should plan an Amazon SimpleDB data architecture that will meet the needs of your project.
After creating a domain, you are ready to start putting data into the domain. The PutAttributes operation creates or replaces attributes in an item. The attributes are specified using the Attribute.X.Name and Attribute.X.Value parameters. The first attribute is specified by the parameters Attribute.1.Name and Attribute.1.Value, the second attribute by the parameters Attribute.2.Name and Attribute.2.Value, and so on. The PutAttributes operation creates or replaces attributes for one item at a time. To create or replace attributes for multiple items in a single call, which can increase throughput and add efficiency, you should use the BatchPutAttributes API.
To retrieve your data, simply issue a GetAttributes call to retrieve a specific item, or use Select, a query syntax very similar to the SQL Select, to query your data set for items that meet specified criteria.
Service Differences & Implications
A primary difference between the services is the data model. For relational databases built on Amazon EC2/EBS or for Amazon RDS, the data model is, quite clearly, relational, as depicted below:

Figure 1. Simple relational data model.
In simple terms, the data is normalized into separate tables, with primary key/foreign key relationships associating the tables to one another. With Amazon SimpleDB, however, there is no notion of relations, and no requirement to develop a (sometimes complex) schema to represent your data. Rather, you organize your data set into domains, and can run queries across all of the data stored in a particular domain. Domains are collections of items that are described by attribute/value pairs. While some developers choose to mimic the relational model (e.g. creating a Products domain, an Orders domain, and so on) all of the data could in fact be co-mingled. In addition, Amazon SimpleDB allows you to easily go back later and add new attributes that only apply to certain items. Thus, Amazon SimpleDB provides the developer with greater flexibility in data storage, but at the cost of less embedded functionality.
An example of the flexibility/functionality trade-off is complex transactions. Amazon RDS (and self-managed relational databases on Amazon EC2), by nature of their relational models, allow for set-based updates and deletes. For an example, refer again to the simple data model in Figure 1. An insert that creates a new order will flow through the various tables to update values like InStockQuantity in the Products table. Amazon SimpleDB, due to its flat data model, cannot support such cascading updates. Each item is treated autonomously, with no relation to other items in the same or different domains.
Relational databases on Amazon EC2 and Amazon RDS are strictly consistent, meaning that any attempt to read an item will always return the very latest update to that item. To ensure this strict consistency, the relational databases lock each record during an update, making it unavailable to be read. Conversely, Amazon SimpleDB uses eventual consistency and allows what are called, dirty readsreturning a response from whichever replica can respond the fastest. If an update is in process on another replica, the user is effectively reading an outdated value for the item. In general, however, most inserts and updates propagate across the multiple Amazon SimpleDB replicas in a matter of 1-2 seconds, lowering the probability of reading an outdated value.
A final primary difference between these solutions is in the service model. Each product provides some automation to make it much simpler to provision than an on-premise solution. However, the amount of administration that the service handles varies, as shown below:
Build your own on Amazon EC2/EBS |
Amazon RDS |
Amazon SimpleDB |
|---|---|---|
Automated hardware provisioning |
Automated hardware provisioning |
Automated hardware provisioning |
User-controlled software updates/patching |
Automated software updates/patching |
Automated software updates/patching |
User initiated backups or snapshots |
Automated backups (administered by user) and user initiated snapshots |
Automated geo-redundant replication |
User responsible for indexing, query tuning |
User responsible for indexing, query tuning |
Automated indexing, query tuning |