Query 101: Building Amazon SimpleDB Queries

Recently enroll in Amazon SimpleDB? Learn about constructing Amazon SimpleDB queries by walking through a number of hands-on examples. Don't forget to give it a grade when you're done.


Submitted By: Bruce@AWS
AWS Products Used: Amazon SimpleDB
Created On: February 07, 2008


By Aleksey Martynov, Amazon Web Services

Amazon SimpleDB is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud. These services are designed to make web-scale computing easier and more cost-effective for developers.

Amazon SimpleDB Query Language

Amazon SimpleDB introduces a simple custom query language that enables you to search against attribute name-value pairs associated with items. The query language takes query expressions similar to the standard SQL SELECT statement.

This section describes Select, a function that takes query expressions similar to the standard SQL SELECT statement.

Use the following format for the Select function.

select output_list  
from domain_name  
[where expression]  
[sort_instructions]  
[limit limit] 

The output_list can be any of the following:

  • * (all attributes)

  • itemName() (the item name only)

  • count(*)

  • An explicit list of attributes (attribute1,..., attributeN)

The domain_name is the domain to search.

The expression is the match expression.

The sort_instructions describe how to sort the results.

The limit is the maximum number of results to return (default: 100, max. 250).

The expression can be any of the following:

  • NOT )

Attribute values must be quoted with a single or double quote. If a quote appears within the attribute value, it must be escaped with the same quote symbol. These following two expressions are equivalent:

select * from mydomain where attr1 = 'He said, "That''s the ticket!"'  select * from mydomain where attr1 = "He said, ""That's the ticket!"""

Attribute and domain names may appear without quotes if they contain only letters, numbers, underscores (_), or dollar symbols ($) and do not start with a number. You must quote all other attribute and domain names with the backtick (`).

select * from mydomain where `timestamp-1` > '1194393600'

You must escape the backtick when it appears in the attribute or domain name by replacing it with two backticks. For example, we can retrieve any items that have the attribute abc`123 set to the value 1 with this select expression:

select * from mydomain where `abc``123` = '1'

It is important to remember that Amazon SimpleDB is a schema-less datastore and everything is stored as a UTF-8 string value. This provides application designers with the flexibility of not predefining different data types for their attributes, but rather changing them dynamically, as the application requires. A good example of why this flexibility is useful is “1984” – should it be treated as an integer, a title of a book, or a date? What happens if the application decides to store it as an integer, only to later realize that it was meant to be the title of a book and should be treated as a string? Amazon SimpleDB provides the flexibility of storing all data in one format, allowing developers to make data type decisions in the application layer without the data store enforcing constraints. As a result, all predicate comparisons are lexicographical in nature. Therefore, when designing an application, you should carefully think through storing data in its appropriate string representation.

Amazon SimpleDB Comparison Operators

Comparison Operator

Description

Example

= Attribute value equals the specified constant.

select * from mydomain where city = 'Seattle'
select * from mydomain where city = 'Seattle' or city = 'Portland'

!= Attribute value does not equal to the specified constant.

select * from mydomain where name != 'John'
select * from mydomain where name != 'John' and name != 'Humberto'

> Attribute value is greater than the specified constant. select * from mydomain where weight > '0034'
>= Attribute value is greater than or equal to the specified constant. select * from mydomain where weight >= '065'
< Attribute value is less than the specified constant. select * from mydomain where weight < '0034'
<= Attribute value is less than or equal to the specified constant. select * from mydomain where year <= '2000'
like

Attribute value starts with the specified constant.
Note: The like operator is similar to starts-with and only supports % at the end of the string.

select * from mydomain where author like 'Rob%'

not like

Attribute value does not start with the specified constant.
Note: The not like operator is similar to does-not-start-with and only supports % at the end of the string.

select * from mydomain where author not like 'Henry%'
between Attribute value falls within a range, including the start and end value. select * from mydomain where year between '1998' and '2000'
in Attribute value is equal to one of the specified constants. select * from mydomain where year in('1998','2000','2003')
is null

Attribute does not exist. If an item has the attribute with an empty string, it is not returned.
Note: Due to performance issues, this operator is not recommended when most items have the specified attribute.

select * from mydomain where year is null
is not null Attribute value contains any value. select * from mydomain where year is not null
every()

For multi-valued attributes, every attribute value must satisfy the constraint.
Note: Due to the cost of running more complex queries, this operator is only recommended for multi-valued attributes.

select * from mydomain where every(keyword) = 'Book'
select * from mydomain where every(keyword) like '***%'

Query syntax also allows for a combination of and and or operators between comparisons. There are two important things to keep in mind about multiple comparisons within the predicate:

  • All operators are evaluated in left-to-right order.
  • Users can specify up to 5 comparisons within a single predicate.

Amazon SimpleDB also supports set operations between predicates. It is important to understand the difference between comparison operators within the predicate and set operations between them: comparison operators can be thought of as definitions of a search range over a single attribute value range, while set operators execute against sets of resulting item names for each predicate and do not take attribute values into account at all.

Note: Negation (not) is not a set operation. It only applies negation to the comparison operator. For example, select * from mydomain where not  year = '2007' is the same as select * from mydomain where year != '2007'.

Amazon SimpleDB Set Operators

Set Operator

Description

Example

intersection

Determine the intersection of item name sets

select * from mydomain where `first name` = 'John' and `last name` = 'Smith'

The best way to setup your application for executing queries against Amazon SimpleDB is to follow the steps outlined in the Amazon SimpleDB Getting Started Guide:

https://docs.amazonwebservices.com/AmazonSimpleDB/latest/GettingStartedGuide/

Developers should also refer to the Amazon SimpleDB Developer Guide for technical details on the service:

https://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/

Sample Dataset

Item Name

Title

Author

Year

Number of Pages

Keywords

Rating

0385333498 The Sirens of Titan Kurt Vonnegut 1959 00336

Book

Paperback

*****

5 stars

Excellent

0802131786 Tropic of Cancer Henry Miller 1934 00318

Book

****

1579124585 The Right Stuff Tom Wolfe 1979 00304

Book

Hardcover

American

****

4 stars

B000T9886K In Between Paul Van Dyk 2007

CD

Trance

4 stars

B00005JPLW 300 Zack Snyder 2007

DVD

Action

Frank Miller

***

3 stars

Not bad

B000SF3NGK Heaven's Gonna Burn Your Eyes Thievery Corporation 2002 *****

The following examples show some basic queries using comparison operators.

Simple Query Examples

Query Expression

Description

Result Set

select * from mydomain where Title = 'The Right Stuff' Retrieves all items where the attribute "Title" equals "The Right Stuff." 1579124585
select * from mydomain where Year > '1985'

Retrieves all items where "Year" is greater than "1985."
Although this looks like a numerical comparison, it is lexicographical. Because the calendar won't change to five digits for nearly 8,000 years, "Year" is not zero padded.

B000T9886K, B00005JPLW, B000SF3NGK
select * from mydomain where Rating like '****%'

Retrieves all items that have at least a 4 star (****) rating.
The prefix comparison is case-sensitive and exact and does not match attributes that only have the "4 star" value, such as item B000T9886K.
Note: The like operator is similar to starts-with and only supports % at the end of the string.

0385333498, 1579124585, 0802131786, B000SF3NGK
select * from mydomain where Pages < '00320'

Retrieves all items that have less than 320 pages.
This attribute is zero padded in the data set and the select expression, which allows for proper lexicographical comparison between the strings. Items without this attribute are not considered.

1579124585, 0802131786,

As you can see, writing simple query expressions is a straightforward and intuitive task. The important thing to remember is the lexicographical nature of comparison operators and the attribute composition of your dataset.

Range Queries

Amazon SimpleDB enables users to execute more than one comparison against the attribute values within the same predicate. One of the common examples of this functionality is when you want to specify a range of values.

Range Query Examples

Query expression

Description

Result set

select * from mydomain where Year > '1975' and Year < '2008' Retrieves all items that have a "Year" value between "1975" and "2008." 1579124585, B000T9886K, B00005JPLW, B000SF3NGK
select * from mydomain where Year between '1975' and '2008' Retrieves all items that have a "Year" value between "1975" and "2008." 1579124585, B000T9886K, B00005JPLW, B000SF3NGK
select * from mydomain where Rating = '***' or Rating = '*****'

Retrieves all items that have 3 (***) or 5 (*****) star rating

This is a discontiguous range query that consists of two distinct values selected from the range of all possible values for the attribute.

0385333498, B00005JPLW, B000SF3NGK
select * from mydomain where (Year > '1950' and Year < '1960') or Year like '193%' or Year = '2007' Retrieves all items where the "Year" attribute is either between "1950" and "1960," falls in the nineteen-thirties, or equals "2007." 0385333498, 0802131786, B000T9886K, B00005JPLW

Multi-Valued Attribute Queries

One of the major differences of Amazon SimpleDB when compared to the traditional RDBMS systems is that it allows for multiple values to be associated with the same attribute name. In the Internet world, attributes such as “tag” or “keyword” may and often do contain multiple values. Amazon SimpleDB data model and query language make it very easy to support such attributes.

There are several important things to understand about predicate execution against attributes with multiple values:

  1. Each attribute value is considered individually against the comparison conditions defined in the predicate.
  2. Item names will be selected if ANY of the values match the predicate condition

Multi-Valued Attribute Query Examples

Query expression

Description

Result set

select * from mydomain where Rating = '4 stars' or Rating = '****'

Retrieves all items with a 4 star (****) rating.

The data set has this rating stored as both "4 stars" and "****." Amazon SimpleDB returns items that have either or both.

1579124585, 0802131786, B000T9886K
select * from mydomain where Keyword = 'Book' and Keyword = 'Hardcover'

Retrieve all items that have the Keyword attribute as both "Book" and "Hardcover."

Based on the data set, you might be surprised that the result did not return the "1579124585" item. As described earlier, each value is evaluated individually against the predicate expression. Since neither of the values satisfies both comparisons defined in the predicate, the item name is not selected.

To get the desired results, you can use the select * from mydomain where Keyword = 'Book' intersection Keyword = 'Hardcover' expression. For more information, see Multiple Attribute Queries.

select * from mydomain where every(keyword) in ('Book', 'Paperback')

Retrieves all items where the only keyword is Book or Paperback. If the item contains any other keyword entries, it is not returned.

0385333498, 0802131786

Multiple Predicate Queries

The previous examples show how to create expressions for single predicates. The Amazon SimpleDB query language also supports constructing expressions across multiple predicates using the intersection operator.

Multiple attribute queries work by producing a set of item names from each predicate and applying the intersection operator. The intersection operator only returns item names that appear in both result sets.

This section shows multiple attribute queries and their results.

The following table shows some multiple attribute queries, how they are interpreted, and the results they return from the sample dataset.

Multiple Predicate Examples

Query Expression

Description

Result Set

select * from mydomain where Rating = '****'

Retrieves all items that have a "****" Rating.

0802131786, 1579124585
select * from mydomain where every(Rating) = '****'

Retrieves all items that only have a "****" Rating. Items are not returned that have a multi-valued Rating attribute that contains any value other than '****.'

0802131786
select * from mydomain where Keyword = 'Book' intersection Keyword = 'Hardcover'

Retrieves all items that have a "Book" Keyword and a "Hardcover" Keyword.

The first predicate produces 0385333498, 0802131786, and 1579124585. The second produces 1579124585. The intersection operator returns results that appear in both queries.

1579124585

Sorting Data

Amazon SimpleDB supports sorting data on a single attribute, in ascending (default) or descending order. This section describes how to sort the result set returned from Select.

Note: All sort operations are performed in lexicographical order.
The sort attribute must be present in at least one of the predicates of the expression.
Sort cannot be applied to expressions that contain the is null predicate operator.

The following table shows sort queries, how they are interpreted, and the results they return from the sample dataset.

Query Expression

Description

Result Set

select * from mydomain where Year < '1980' order by Year asc

Retrieves all items released before 1980 and lists them in ascending order.

0802131786, 0385333498, 1579124585
select * from mydomain where Year < '1980' order by Year

Same as the previous entry, with "asc" (ascending) omitted.

0802131786, 0385333498, 1579124585
select * from mydomain where Year = '2007' intersection Author is not null order by Author desc

Retrieves all items released in 2007 and sorts them by author name in descending order.

B00005JPLW, B000T9886K
select * from mydomain order by Year asc

Invalid because Year is not constrained by a predicate in the where clause.

400 Error
select * from mydomain where Year < '1980' order by Year limit 2

Retrieves two items that were released before 1980 and lists them in ascending order.

0802131786, 0385333498

Counting Data

If you want to count the number of items in a result set instead of returning the items, use count(*). Instead of returning a list of items, Amazon SimpleDB returns a single item called Domain with a Count attribute.

Note: If the count request takes more than five seconds, Amazon SimpleDB returns the number of items that it could count and a next token to return additional results. The client is responsible for accumulating the partial counts.
If Amazon SimpleDB returns a 408 Request Timeout, please resubmit the request.
The default result limit of 100 and maximum result limit of 250 do not apply to count(*). However, you can restrict the maximum number of counted results using the limit clause.

The next token returned by count(*) and select are interchangeable as long as the where and order by clauses match. For example, if you want to return the 200 items after the first 10,000 (similar to an offset), you can perform a count with a limit clause of 10,000 and use the next token to return the next 200 items with select.

The following table shows count(*) queries and the results they return from the sample dataset.

Query Expression

Description

Result Set

select count(*) from mydomain where Title = 'The Right Stuff' Counts all items where the attribute "Title" equals "The Right Stuff." 1
select count(*) from mydomain where Year > '1985'

Counts all items where "Year" is greater than "1985."

3
select count(*) from mydomain limit 500 Counts all items in the domain, with a limit of 500. 6
select count(*) from mydomain limit 4 Counts all items in the domain, with a limit of 4. 4