Query 101: Building Amazon SimpleDB Queries
Recently enroll in Amazon SimpleDB? Learn about constructing Amazon SimpleDB queries by walking through a number of hands-on examples. Don't forget to give it a grade when you're done.
Submitted By: Bruce@AWS
AWS Products Used: Amazon SimpleDB
Created On: February 07, 2008
By Aleksey Martynov, Amazon Web Services
Amazon SimpleDB is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud. These services are designed to make web-scale computing easier and more cost-effective for developers.
Amazon SimpleDB Query Language
Amazon SimpleDB introduces a simple custom query language that enables you to search against attribute name-value pairs associated with items. The query language takes query expressions similar to the standard SQL SELECT statement.
This section describes Select, a function that takes query expressions similar to the standard SQL SELECT statement.
Use the following format for the Select function.
select output_list from domain_name [where expression] [sort_instructions] [limit limit]
The output_list can be any of the following:
-
* (all attributes)
-
itemName() (the item name only)
-
count(*)
-
An explicit list of attributes (attribute1,..., attributeN)
The domain_name is the domain to search.
The expression is the match expression.
The sort_instructions describe how to sort the results.
The limit is the maximum number of results to return (default: 100, max. 250).
The expression can be any of the following:
-
-
NOT )
-
-
-
Attribute values must be quoted with a single or double quote. If a quote appears within the attribute value, it must be escaped with the same quote symbol. These following two expressions are equivalent:
select * from mydomain where attr1 = 'He said, "That''s the ticket!"' select * from mydomain where attr1 = "He said, ""That's the ticket!"""
Attribute and domain names may appear without quotes if they contain only letters, numbers, underscores (_), or dollar symbols ($) and do not start with a number. You must quote all other attribute and domain names with the backtick (`).
select * from mydomain where `timestamp-1` > '1194393600'
You must escape the backtick when it appears in the attribute or domain name by replacing it with two backticks. For example, we can retrieve any items that have the attribute abc`123 set to the value 1 with this select expression:
select * from mydomain where `abc``123` = '1'
It is important to remember that Amazon SimpleDB is a schema-less datastore and everything is stored as a UTF-8 string value. This provides application designers with the flexibility of not predefining different data types for their attributes, but rather changing them dynamically, as the application requires. A good example of why this flexibility is useful is “1984” – should it be treated as an integer, a title of a book, or a date? What happens if the application decides to store it as an integer, only to later realize that it was meant to be the title of a book and should be treated as a string? Amazon SimpleDB provides the flexibility of storing all data in one format, allowing developers to make data type decisions in the application layer without the data store enforcing constraints. As a result, all predicate comparisons are lexicographical in nature. Therefore, when designing an application, you should carefully think through storing data in its appropriate string representation.
Amazon SimpleDB Comparison Operators
Comparison Operator |
Description |
Example |
---|---|---|
= | Attribute value equals the specified constant. |
select * from mydomain where city = 'Seattle' |
!= | Attribute value does not equal to the specified constant. |
select * from mydomain where name != 'John' |
> | Attribute value is greater than the specified constant. | select * from mydomain where weight > '0034' |
>= | Attribute value is greater than or equal to the specified constant. | select * from mydomain where weight >= '065' |
< | Attribute value is less than the specified constant. | select * from mydomain where weight < '0034' |
<= | Attribute value is less than or equal to the specified constant. | select * from mydomain where year <= '2000' |
like |
Attribute value starts with the specified constant.
|
select * from mydomain where author like 'Rob%' |
not like |
Attribute value does not start with the specified constant.
|
select * from mydomain where author not like 'Henry%' |
between | Attribute value falls within a range, including the start and end value. | select * from mydomain where year between '1998' and '2000' |
in | Attribute value is equal to one of the specified constants. | select * from mydomain where year in('1998','2000','2003') |
is null |
Attribute does not exist. If an item has the attribute with an empty string, it is not returned. |
select * from mydomain where year is null |
is not null | Attribute value contains any value. | select * from mydomain where year is not null |
every() |
For multi-valued attributes, every attribute value must satisfy the constraint. |
select * from mydomain where every(keyword) = 'Book' |
Query syntax also allows for a combination of and and or operators between comparisons. There are two important things to keep in mind about multiple comparisons within the predicate:
- All operators are evaluated in left-to-right order.
- Users can specify up to 5 comparisons within a single predicate.
Amazon SimpleDB also supports set operations between predicates. It is important to understand the difference between comparison operators within the predicate and set operations between them: comparison operators can be thought of as definitions of a search range over a single attribute value range, while set operators execute against sets of resulting item names for each predicate and do not take attribute values into account at all.
Note: Negation (not) is not a set operation. It only applies negation to the comparison operator. For example, select * from mydomain where not year = '2007' is the same as select * from mydomain where year != '2007'.
Amazon SimpleDB Set Operators
Set Operator |
Description |
Example |
---|---|---|
intersection |
Determine the intersection of item name sets |
|
The best way to setup your application for executing queries against Amazon SimpleDB is to follow the steps outlined in the Amazon SimpleDB Getting Started Guide:
https://docs.amazonwebservices.com/AmazonSimpleDB/latest/GettingStartedGuide/
Developers should also refer to the Amazon SimpleDB Developer Guide for technical details on the service:
https://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/
Sample Dataset
Item Name |
Title |
Author |
Year |
Number of Pages |
Keywords | Rating |
---|---|---|---|---|---|---|
0385333498 | The Sirens of Titan | Kurt Vonnegut | 1959 | 00336 |
Book Paperback |
***** 5 stars Excellent |
0802131786 | Tropic of Cancer | Henry Miller | 1934 | 00318 | Book |
**** |
1579124585 | The Right Stuff | Tom Wolfe | 1979 | 00304 |
Book Hardcover American |
**** 4 stars |
B000T9886K | In Between | Paul Van Dyk | 2007 |
CD Trance |
4 stars |
|
B00005JPLW | 300 | Zack Snyder | 2007 |
DVD Action Frank Miller |
*** 3 stars Not bad |
|
B000SF3NGK | Heaven's Gonna Burn Your Eyes | Thievery Corporation | 2002 | ***** |
The following examples show some basic queries using comparison operators.
Simple Query Examples
Query Expression |
Description |
Result Set |
---|---|---|
select * from mydomain where Title = 'The Right Stuff' | Retrieves all items where the attribute "Title" equals "The Right Stuff." | 1579124585 |
select * from mydomain where Year > '1985' |
Retrieves all items where "Year" is greater than "1985." |
B000T9886K, B00005JPLW, B000SF3NGK |
select * from mydomain where Rating like '****%' |
Retrieves all items that have at least a 4 star (****) rating. |
0385333498, 1579124585, 0802131786, B000SF3NGK |
select * from mydomain where Pages < '00320' |
Retrieves all items that have less than 320 pages. |
1579124585, 0802131786, |
As you can see, writing simple query expressions is a straightforward and intuitive task. The important thing to remember is the lexicographical nature of comparison operators and the attribute composition of your dataset.
Range Queries
Amazon SimpleDB enables users to execute more than one comparison against the attribute values within the same predicate. One of the common examples of this functionality is when you want to specify a range of values.
Range Query Examples
Query expression |
Description |
Result set |
---|---|---|
select * from mydomain where Year > '1975' and Year < '2008' | Retrieves all items that have a "Year" value between "1975" and "2008." | 1579124585, B000T9886K, B00005JPLW, B000SF3NGK |
select * from mydomain where Year between '1975' and '2008' | Retrieves all items that have a "Year" value between "1975" and "2008." | 1579124585, B000T9886K, B00005JPLW, B000SF3NGK |
select * from mydomain where Rating = '***' or Rating = '*****' |
Retrieves all items that have 3 (***) or 5 (*****) star rating This is a discontiguous range query that consists of two distinct values selected from the range of all possible values for the attribute. |
0385333498, B00005JPLW, B000SF3NGK |
select * from mydomain where (Year > '1950' and Year < '1960') or Year like '193%' or Year = '2007' | Retrieves all items where the "Year" attribute is either between "1950" and "1960," falls in the nineteen-thirties, or equals "2007." | 0385333498, 0802131786, B000T9886K, B00005JPLW |
Multi-Valued Attribute Queries
One of the major differences of Amazon SimpleDB when compared to the traditional RDBMS systems is that it allows for multiple values to be associated with the same attribute name. In the Internet world, attributes such as “tag” or “keyword” may and often do contain multiple values. Amazon SimpleDB data model and query language make it very easy to support such attributes.
There are several important things to understand about predicate execution against attributes with multiple values:
- Each attribute value is considered individually against the comparison conditions defined in the predicate.
- Item names will be selected if ANY of the values match the predicate condition
Multi-Valued Attribute Query Examples
Query expression |
Description |
Result set |
---|---|---|
select * from mydomain where Rating = '4 stars' or Rating = '****' |
Retrieves all items with a 4 star (****) rating. The data set has this rating stored as both "4 stars" and "****." Amazon SimpleDB returns items that have either or both. |
1579124585, 0802131786, B000T9886K |
select * from mydomain where Keyword = 'Book' and Keyword = 'Hardcover' |
Retrieve all items that have the Keyword attribute as both "Book" and "Hardcover." Based on the data set, you might be surprised that the result did not return the "1579124585" item. As described earlier, each value is evaluated individually against the predicate expression. Since neither of the values satisfies both comparisons defined in the predicate, the item name is not selected. To get the desired results, you can use the select * from mydomain where Keyword = 'Book' intersection Keyword = 'Hardcover' expression. For more information, see Multiple Attribute Queries. |
|
select * from mydomain where every(keyword) in ('Book', 'Paperback') | Retrieves all items where the only keyword is Book or Paperback. If the item contains any other keyword entries, it is not returned. |
0385333498, 0802131786 |
Multiple Predicate Queries
The previous examples show how to create expressions for single predicates. The Amazon SimpleDB query language also supports constructing expressions across multiple predicates using the intersection operator.
Multiple attribute queries work by producing a set of item names from each predicate and applying the intersection operator. The intersection operator only returns item names that appear in both result sets.
This section shows multiple attribute queries and their results.
The following table shows some multiple attribute queries, how they are interpreted, and the results they return from the sample dataset.
Multiple Predicate Examples
Query Expression |
Description |
Result Set |
---|---|---|
select * from mydomain where Rating = '****' | Retrieves all items that have a "****" Rating. |
0802131786, 1579124585 |
select * from mydomain where every(Rating) = '****' | Retrieves all items that only have a "****" Rating. Items are not returned that have a multi-valued Rating attribute that contains any value other than '****.' |
0802131786 |
select * from mydomain where Keyword = 'Book' intersection Keyword = 'Hardcover' |
Retrieves all items that have a "Book" Keyword and a "Hardcover" Keyword. The first predicate produces 0385333498, 0802131786, and 1579124585. The second produces 1579124585. The intersection operator returns results that appear in both queries. |
1579124585 |
Sorting Data
Amazon SimpleDB supports sorting data on a single attribute, in ascending (default) or descending order. This section describes how to sort the result set returned from Select.
Note: All sort operations are performed in lexicographical order.
The sort attribute must be present in at least one of the predicates of the expression.
Sort cannot be applied to expressions that contain the is null predicate operator.
The following table shows sort queries, how they are interpreted, and the results they return from the sample dataset.
Query Expression |
Description |
Result Set |
---|---|---|
select * from mydomain where Year < '1980' order by Year asc | Retrieves all items released before 1980 and lists them in ascending order. |
0802131786, 0385333498, 1579124585 |
select * from mydomain where Year < '1980' order by Year | Same as the previous entry, with "asc" (ascending) omitted. |
0802131786, 0385333498, 1579124585 |
select * from mydomain where Year = '2007' intersection Author is not null order by Author desc | Retrieves all items released in 2007 and sorts them by author name in descending order. |
B00005JPLW, B000T9886K |
select * from mydomain order by Year asc | Invalid because Year is not constrained by a predicate in the where clause. |
400 Error |
select * from mydomain where Year < '1980' order by Year limit 2 | Retrieves two items that were released before 1980 and lists them in ascending order. |
0802131786, 0385333498 |
Counting Data
If you want to count the number of items in a result set instead of returning the items, use count(*). Instead of returning a list of items, Amazon SimpleDB returns a single item called Domain with a Count attribute.
Note: If the count request takes more than five seconds, Amazon SimpleDB returns the number of items that it could count and a next token to return additional results. The client is responsible for accumulating the partial counts.
If Amazon SimpleDB returns a 408 Request Timeout, please resubmit the request.
The default result limit of 100 and maximum result limit of 250 do not apply to count(*). However, you can restrict the maximum number of counted results using the limit clause.
The next token returned by count(*) and select are interchangeable as long as the where and order by clauses match. For example, if you want to return the 200 items after the first 10,000 (similar to an offset), you can perform a count with a limit clause of 10,000 and use the next token to return the next 200 items with select.
The following table shows count(*) queries and the results they return from the sample dataset.
Query Expression |
Description |
Result Set |
---|---|---|
select count(*) from mydomain where Title = 'The Right Stuff' | Counts all items where the attribute "Title" equals "The Right Stuff." | 1 |
select count(*) from mydomain where Year > '1985' | Counts all items where "Year" is greater than "1985." |
3 |
select count(*) from mydomain limit 500 | Counts all items in the domain, with a limit of 500. | 6 |
select count(*) from mydomain limit 4 | Counts all items in the domain, with a limit of 4. | 4 |