Amazon SimpleDB provides a simple web services interface to create and store multiple datasets, query your data easily, and return the results. Once items are stored in a domain, Amazon SimpleDB responds to changes in traffic by charging you only for the compute and storage resources actually consumed in serving your requests (e.g. Select or GetAttributes). However, with Amazon SimpleDB, the true key to scalability in both request throughput and data volume is leveraging the service’s horizontal scale-out architecture. Since Amazon SimpleDB is designed with parallelism in mind, you can obtain higher throughput by creating additional domains and spreading your data and requests across them. By spreading your data and requests across multiple domains (and thus, machine resources), you benefit from a greater “surface area” of compute resources to perform requests and queries against. For example, if you spread your data across 10 domains and execute 10 queries in parallel, you will get much higher throughput than performing 10 queries sequentially against a single domain that contains all of your data.
The flexibility of Amazon SimpleDB allows you to change your data model on the fly, adding or removing attributes without breaking a rigid schema. As a result, you can reflect changes to your application and business quickly without costly refactoring or painful schema updates. You can also choose between consistent or eventually consistent read requests, gaining the flexibility to match read performance (latency and throughput) and consistency requirements to the demands of your application, or even disparate parts within your application.
With Amazon SimpleDB, what the service doesn’t require you to do is equally important. Amazon SimpleDB automatically manages infrastructure provisioning, hardware and software maintenance, replication and indexing of data items, and performance tuning.AWS provides a number of database alternatives for developers. Amazon SimpleDB provides simple index and query capabilities with seamless scalability. Amazon RDS enables you to run a fully featured relational database while offloading database administration. And, using one of our many relational database AMIs on Amazon EC2 and Amazon EBS allows you to operate your own relational database in the cloud. There are important differences between these alternatives that may make one more appropriate for your use case.
See Running Databases on AWS for additional guidance on which solution is best for you.
Amazon S3 stores raw data. Amazon SimpleDB takes your data as input and indexes all the attributes, enabling you to quickly query that data. Additionally, Amazon S3 and Amazon SimpleDB use different types of physical storage. Amazon S3 uses dense storage drives that are optimized for storing larger objects inexpensively. Amazon SimpleDB stores smaller bits of data and uses less dense drives that are optimized for data access speed.
In order to optimize your costs across AWS services, large objects or files should be stored in Amazon S3, while smaller data elements or file pointers (possibly to Amazon S3 objects) are best saved in Amazon SimpleDB. Because of the close integration between services and the free data transfer within the AWS environment, developers can easily take advantage of both the speed and querying capabilities of Amazon SimpleDB as well as the low cost of storing data in Amazon S3, by integrating both services into their applications. To learn more about the benefits of using Amazon SimpleDB in conjunction with Amazon S3, follow this link.
There are several factors to consider based on your specific application. You may want to store your data in a Region that…
Amazon SimpleDB supports two read consistency options: eventually consistent reads and consistent reads.
Eventually Consistent Reads (Default). The eventually consistent read option maximizes your read performance (in terms of low latency and high throughput). However, an eventually consistent read (using Select or GetAttributes) might not reflect the results of a recently completed write (using PutAttributes, BatchPutAttributes, DeleteAttributes). Consistency across all copies of data is usually reached within a second; repeating a read after a short time should return the updated data.
Consistent Reads. In addition to eventually consistent reads, Amazon SimpleDB also gives you the flexibility and control to request a consistent read if your application, or an element of your application, requires it. A consistent read (using Select or GetAttributes with ConsistentRead=true) returns a result that reflects all writes that received a successful response prior to the read.
By default, GetAttributes and Select perform an eventually consistent read. Since a consistent read can potentially incur higher latency and lower read throughput it is best to use it only when an application scenario mandates that a read operation absolutely needs to read all writes that received a successful response prior to that read. For all other scenarios the default eventually consistent read will yield the best performance. To learn more about consistency options with Amazon SimpleDB, please see our Developer Guide.As previously mentioned, the flexibility Amazon SimpleDB provides in specifying your read consistency requirements is important because different types of applications and use cases may have different requirements in terms of performance and consistency. Note also that Amazon SimpleDB allows you to specify consistency settings for each individual read request, so the same application could have disparate parts following different consistency settings. Here is some guidance on times when each read consistency option may be most appropriate:
Eventually Consistent Reads:
Any application (or part of an application) that values read performance (latency and throughput) higher than strong consistency will be well suited to the eventually consistent read. Data that has a high read to write ratio often fits this description. For example, friend/follower lists, photo tags, and personal details within a social network. In general, use cases where performance (providing an answer) is more important than providing the most up-to-date answer. An example might be an ad network, where showing users an ad from inventory as fast as possible is more important than showing the ad (based on logic updated within the past second). Another guideline for whether eventually consistent reads are appropriate for your application is whether it can deal with the notion of user-perceived consistency. Imagine an application that involves direct user interaction rather than programmatic access. For example, imagine a user updating a blog post and hitting refresh, or another user posting a comment to the blog. This wait time is what we refer to as user-perceived consistency – as long as the data is consistent in time for the end user to see it, the application can utilize eventual consistency. In these scenarios, the amount of time required for a write to reach all copies of the data is smaller than the time lag before the customer expects the new data to be visible (e.g., refreshes the page). As mentioned previously, Amazon SimpleDB usually reaches consistency within a second. If end users of your application will not notice or care if updates are reflected within a second, eventual consistency makes sense for the general read performance benefits.
When an item is updated an eventually consistent read may return the current value or the old value. When an item is inserted an eventually consistent read may not return the item.
Consistent Reads:
Depending on your application, you may need to users who read a data item to view the most recently updated version from amongst many concurrent write updates. For example, you may be running a statistics or reporting application where you can’t accept the risk that a recent write operation is not be reflected in the results of a GetAttributes call or Select query. In such a case, passing the ConsistentRead = True parameter will provide consistent results.
Storing application in-memory state in SimpleDB is another example. As the value of the application state changes, the application can update SimpleDB. If the application goes down and needs to be restarted then the application can issue a consistent GetAttributes or Select call to SimpleDB to obtain the last updated application state.
To learn more about consistency with Amazon SimpleDB, please refer to the Amazon SimpleDB Developer Guide or Consistency Enhancements Whitepaper.
Amazon SimpleDB is not a relational database and sacrifices complex transactions and relations (i.e., joins) in order to provide unique functionality and performance characteristics. However, Amazon SimpleDB does offer transactional semantics such as:
Conditional Puts/Deletes — enable you to insert, replace, or delete values for one or more attributes of an item if the existing value of an attribute matches the value you specify. If the value does not match or is not present, the update is rejected. Conditional Puts/Deletes are useful for preventing lost updates when different sources write concurrently to the same item.
Conditional puts and deletes are exposed via the PutAttributes and DeleteAttributes APIs by specifying an optional condition with an expected value. For example, if your application was reserving seats or selling tickets to an event, you might allow a purchase (i.e., write update) only if the specified seat was still available (the optional condition). These semantics can also be used to implement functionality such as counters, inserting an item only if it does not already exist, and optimistic concurrency control (OCC). An application can implement OCC by maintaining a version number (or a timestamp) attribute as part of an item and by performing a conditional put/delete based on the value of this version number.
To learn more about transactional semantics with Amazon SimpleDB, please refer to the Amazon SimpleDB Developer Guide.
You can get started with SimpleDB for free and without risk. Under the free tier program, you pay no charges on the first 25 Machine Hours, 1 GB of Storage, and 1 GB of Data Transfer Out that you consume every month. Under the new free inbound data transfer promotion, all data transfer into Amazon SimpleDB is free of charge until June 30, 2010. Amazon SimpleDB lets developers pay only for what they consume and there is no minimum fee.
Machine Utilization
- First 25 Amazon SimpleDB Machine Hours consumed per month are free
- $0.14 per Amazon SimpleDB Machine Hour consumed thereafter for the US-East (Northern Virginia) Region, or $.154 per Machine Hour thereafter for the EU (Ireland) Region and US-West (Northern California) Region
Amazon SimpleDB measures the machine utilization of each request and charges based on the amount of machine capacity used to complete the particular request (QUERY, GET, PUT, etc.), normalized to the hourly capacity of a circa 2007 1.7 GHz Xeon processor.
Data Transfer
- $0.000 per GB – all data transfer into SimpleDB (through June 30, 2010)
*
- First 1 GB of data transferred out per month is free; thereafter:
- $0.15 per GB – first 10 TB / month data transfer out
- $0.11 per GB – next 40 TB / month data transfer out
- $0.09 per GB – next 100 TB / month data transfer out
- $0.08 per GB – data transfer out / month over 150 TB
*Data transfer in will be $.10 per GB after June 30, 2010
Data transfer “in” and “out” refers to transfer into and out of Amazon SimpleDB. Data transferred between Amazon SimpleDB and other Amazon Web Services in the same region is free of charge (i.e., $0.00 per GB).
Structured Data Storage
- First 1 GB stored per month is free
- $0.25 per GB-month thereafter for the US-East (Northern Virginia) Region, or $.275 per GB-month thereafter for the EU (Ireland) Region and the US-West (Northern California) Region.
Amazon SimpleDB measures the size of your billable data by adding the raw byte size of the data you upload + 45 bytes of overhead for each item, attribute name and attribute-value pair.
The following examples refer to charges for usage beyond the free usage levels described above. As previously described, usage below the monthly free tier is provided at no charge.
Machine Utilization:
Amazon SimpleDB measures the machine utilization of each request and charges based on the amount of machine capacity used to complete the particular request (QUERY, GET, PUT, etc.), normalized to the hourly capacity of a circa 2007 1.7 GHz Xeon processor. Machine utilization is driven by the amount of data (# of attributes, length of attributes) processed by each request. A GET operation that retrieves 256 attributes will use more resources than a GET that retrieves only 1 attribute. A multi-predicate QUERY that examines 100,000 attributes will cost more than a single predicate query that examines 250.
In the response message for each request, Amazon SimpleDB returns a field called Box Usage. Box Usage is the measure of machine resources consumed by each request. It does not include bandwidth or storage. Box usage is reported as the portion of a machine hour used to complete a particular request. For the US-East (Northern Virginia) Region, the cost of an individual request is Box Usage (expressed in hours) * $0.14 per Amazon SimpleDB Machine hour. The cost of all your requests is the sum of Box Usage (expressed in hours) * $0.14.
For example, if over the course of a month, the sum of the Box Usage for your requests uses the equivalent of one 1.7 GHz Xeon processor for 9 hours, your charge will be:
9 hours * $0.14 per Amazon SimpleDB Machine hour = $1.26.
If your query domains are located in the EU (Ireland) Region or US-West (Northern California Region), Amazon SimpleDB Machine hours are priced at $.154 per Machine hour, and all cost calculations should be adjusted accordingly.
Data Transfer Example:
You transfer 500 MB of data out of Amazon SimpleDB each day during the month of March.
Total Data Transfer Out for the month = 500 MB x (1 GB / 1,024 MB) x 31 days = 15.14 GB
Total charge = 15.14 GB x ($0.15 / GB) = $2.27
Storage
The best way to predict the size of your structured data storage is as follows:
Raw byte size (GB) of all item IDs + 45 bytes per item + Raw byte size (GB) of all attribute names + 45 bytes per attribute name + Raw byte size (GB) of all attribute-value pairs + 45 bytes per attribute-value pair
To calculate your estimated monthly storage cost for the US-East (Northern Virginia) Region, take the resulting size in GB and multiply by $0.25. Alternatively, for the EU (Ireland) Region or the US-West (Northern California) Region, take the resulting size in GB and multiply by $.275.
We charge less where our costs are less. For example, our costs are lower in the Northern Virginia Region than in the Northern California Region.
You organize your structured data into domains and can run queries across all of the data stored in a particular domain. Domains are comprised of items, and items are described by attribute-value pairs. To understand these elements, consider the metaphor of data stored in a spreadsheet table. An Amazon SimpleDB domain is like a worksheet, items are like rows of data, attributes are like column headers, and values are the data entered in each of the cells.
However unlike a spreadsheet, Amazon SimpleDB allows for multiple values to be associated with each “cell” (e.g., for item “123,” the attribute “color” can have both value “blue” and value “red”). Additionally, in Amazon SimpleDB, each item can have its own unique set of associated attributes (e.g., item “123” might have attributes “description” and “color” whereas item “789” has attributes “description,” “color” and “material”). Amazon SimpleDB automatically indexes your data, making it easy to quickly find the information that you need. There is no need to pre-define a schema or change a schema if new data is added later.The service runs within Amazon’s high-availability data centers to provide strong and consistent performance. To prevent data from being lost or becoming unavailable, your fully indexed data is stored redundantly across multiple servers and data centers. This reliability is consistent across all Amazon SimpleDB Regions.
Anyone can use Amazon SimpleDB. You just have to decide which Region you want Amazon SimpleDB to store your data in.