Z-order indexing for multifaceted queries in Amazon DynamoDB: Part 2
In a previous AWS Database Blog post, I introduced Z-order indexing, a way in which you can sort your data to efficiently query an Amazon DynamoDB table by using range bounds on multiple attributes. In this post, we explore the process of creating a schema for your index. We look at how to decide which attributes to include in your schema, how your index’s schema impacts query efficiency, and how to work with a variety of data types.
This post builds on concepts that are described in Part 1, so I recommend taking some time to review it before diving in.
Creating a Z-order index schema
As I discussed in the previous post, each record that is stored in the index is assigned a Z-address. A Z-address is a fixed-length binary value that you can find by interleaving the bits of a known set of the record’s attributes. The following diagram shows the calculation of a Z-address with two attributes,
The Z-address is used as the sort key (or local secondary index key) in a DynamoDB table. Sort keys serve two purposes: They sort the table’s data at rest to allow for fast, bounded queries, and they uniquely identify a record within a partition. To learn more about partition and sort keys, see Primary Key in the DynamoDB documentation.
A schema describes the structure of the information that is stored in a table or index. In the context of a Z-order index, a schema includes the following:
- The names of the attributes to be indexed (
yin the preceding diagram).
- The order in which the attributes will be encoded (
- The attributes’ respective data types (
yare unsigned integers, whole numbers greater than or equal to 0) and how to represent them in binary.
Each time that you insert a new record into a table or update an existing record, you have to calculate that record’s Z-address. Likewise, when you create a query, you have to calculate the Z-addresses for a minimum record and a maximum record that represent our range bounds. Each time you calculate a Z-address, you need to encode the same attributes in the same order, taking care to treat those attributes’ data types consistently between calculations. The schema dictates which operations you should perform and in which order.
Note that a Z-order index schema is different from a DynamoDB table KeySchema, which defines the primary key on your DynamoDB table. From here on, when I use the word “schema,” I am referring to a Z-order index schema, not a DynamoDB table
Selecting attributes to include in your schema
Unlike traditional composite indexes, searches against Z-order indexes let you query against any subset of the attributes that they include. However, you shouldn’t use every attribute from your data to build your index.
Z-order indexes are only as helpful as the bounds provided by your queries. If your query defines narrow range bounds for each attribute in the index, the query will be executed quickly. However, as I showed in my previous post, query performance degrades if you leave several attributes unbound. The fewer restrictions you have in your queries, the less the index can narrow your search.
To help us reason about this tradeoff in greater detail, let’s take a look at the math that underlies it. All of the Z-addresses stored in the same index are required to have the same number of bits, and no two records can share the same Z-address. This means that an index whose addresses are 32 bits long can hold 232 (or 4,294,967,296) different records. The range of possible Z-addresses that your index could store is called the Z-address search space.
Each Z-address in the search space may or may not contain a record. When you execute a query, you start by identifying ranges of Z-addresses that might be populated with records that you’re interested in. You then ask the database to visit those Z-addresses to see if any records were present. Visiting Z-addresses takes time and consumes read capacity units. The more Z-addresses that you have to visit to execute a query, the worse its performance will be.
To estimate how well a query will perform for a given index, you can calculate the portion of the Z-address search space that you’re going to have to visit by using the following formula:
Number of Z-addresses to visit =
In this formula:
nis the number of addresses in the search space.
ais the number of attributes included in the index. This formula makes the simplifying assumption that each attribute in the index is represented with the same number of bits.
bis the number of attributes with an exact bound provided by the query. An exact bound limits a dimension to a single possible value (for example,
x = 7) rather than a range of possible values (for example,
7 <= x <= 10). You can use ranges with this formula by assigning
ba fractional value that indicates how much of that dimension’s search space will be eliminated by its bound.
Note that this formula tells you the portion of the search space that must be visited, and not the portion of your records. How close these two amounts are in practice depends on how evenly distributed the items in your table are with respect to the search space.
As an illustration, let’s look at an index composed of four attributes:
d. Each attribute is an unsigned integer and is represented by 2 bytes. This means that the Z-addresses are 8 bytes each, creating a 64-bit search space.
The following table shows the number of Z-addresses that must be explored for a series of increasingly broad example queries.
|Query||Dimensions with an exact bound||Formula||Simplified||Z-addresses to explore|
a = 5 AND
b = 2 AND
c = 8 AND
d = 1
a = 5 AND
b = 2 AND
c = 8
a = 5 AND
b = 2
|a = 5||1||(264)((4-1)/4)||248||281,474,976,710,656|
As you can see in the preceding table, leaving dimensions unbound causes the index’s utility to decline sharply. To minimize this effect, you want to make sure that you only include attributes in the index that are going to carry their weight for most queries. To do this, make a list of the queries that you plan to run against your data. While reviewing the list of queries, identify which attributes are going to have bounds that are both of the following:
- Frequently defined. It’s okay for an attribute’s value to be open-ended in some of your queries—that flexibility is part of what makes Z-order indexes so versatile. But if an attribute in an index is unbound more often than it’s bound, it can end up being a liability to overall query performance.
- Highly selective. Bounds that are placed on attributes in an index should eliminate large swaths of the search space. It’s not worth including an attribute if its bounds aren’t going to exclude most possible results. As an example, consider a Boolean attribute that’s only true for 5 percent of your records—something like an
isPlatinumMemberflag in a
Userstable. If your queries apply only to Platinum members, this attribute will be highly selective. However, if you’re typically looking for non-Platinum members, having this one-bit, true-or-false attribute in your index will double your search space while only eliminating 5 percent of possible values when it’s defined.
Selecting binary representations for your attributes
Now that you’ve selected which attributes to include in the index, you need to decide how to represent those attribute values in binary. Unfortunately, this isn’t as easy as using the representation employed by your programming language of choice. If you want DynamoDB to sort your records by their Z-addresses, you must encode your attribute data in a way that works with the existing DynamoDB binary sort key sorting behavior.
DynamoDB supports sort keys as large as 1,024 bytes and sorts those keys lexicographically. This means that the database compares each binary key’s bytes one at a time from left to right, looking for the first byte value that the keys do not have in common. When the database finds a difference in the byte values, the key that had the lower byte value is placed first in the list. For example, consider the following comparison of two 4-byte binary sort keys.
|Byte 0||Byte 1||Byte 2||Byte 3|
|Binary sort key A||0111 1010||0110 0001||0110 0011||0110 1011|
|Binary sort key B||0111 1010||0110 0001||0110 1011||0110 1011|
When sorting the two keys in the preceding table, DynamoDB does the following:
- Visits byte 0 of both key A and key B (A0 and B0, respectively) and compares the bytes that are stored there.
- Because A0 and B0 are equal (
01111010 == 01111010), DynamoDB moves to byte 1.
- Because A1 and B1 are equal (
01100001 == 01100001), DynamoDB moves to byte 2.
- Because A2 is less than B2 (
01100011 < 01101011), DynamoDB stores the record with key A in the database before the record with key B.
If you use a Z-order index in DynamoDB, the attribute that acts as your table’s sort key is a Z-address. For everything to work as expected, Z-addresses must have a meaningful lexicographical order. Happily, it turns out that if each dimension in a Z-address can be sorted lexicographically, the Z-address itself can be as well.
All of the examples that we’ve seen so far (including those in the previous post) have looked at Z-order indexes whose dimensions were unsigned integers. This is because you can sort unsigned integers lexicographically and produce the same ordering as if you had sorted them numerically. This property allowed me to use a commonly understood binary representation to demonstrate how Z-order indexes work. However, you might want to include other data types in a Z-order index, and many of them cannot be represented using the same binary layout that you might use in a computer program. Let’s take a look at some possible ways to work with these data types.
Two’s complement notation is a common binary representation for signed integers. Unlike unsigned integers, sorting a collection of signed integers’ binary representations lexicographically produces an ordering that is very different from their numerical order. As the following table shows, negative integers appear after positive ones when sorted lexicographically.
This ordering is a problem. You need to be able to interleave the signed integers’ bytes with those of the other indexed attributes to produce the record’s Z-address. Unfortunately, if the signed integers’ binary representations can’t be sorted lexicographically, the Z-addresses that you create with them can’t be sorted lexicographically either.
Fortunately, Z-order indexes have a property that lets you work around this ordering issue. The binary representations that you choose for each dimension don’t need to represent the encoded value; they only need to maintain that dimension’s natural order when sorted lexicographically. Let’s explore this concept.
When you encode a value in binary to add its interleaved bits to a Z-address, you don’t have to be able to reverse the process. For example, imagine that you’re storing the following document in your table:
You decide to create a Z-order index that includes all of the attributes, so you add a
z_address attribute to the document to store the computed binary value.
The sole responsibility of the
z_address attribute is to enable DynamoDB to order the table, facilitating B-tree lookup operations. After you retrieve this document from the database, you no longer have a use for the contents of the
z_address attribute. The original dimensional values that you used to create it are stored alongside it in their native formats and are readily available, so you don’t need to try to extract the dimensional values from the Z-address. This means that the binary representation of a value that you interleave into the Z-address can be whatever you want it to be, as long as its lexicographical ordering is the same as the data type’s natural ordering.
To clarify this further, let’s return to the task of encoding signed integers. You can’t use the integer’s two’s complement representation directly because its lexicographical ordering isn’t the same as its numerical ordering. However, you can use a mapping function to take the two’s complement representation of the value and flip the first bit. As the following table illustrates, flipping the first bit of the integer’s binary representation and reinterpreting the resulting value as an unsigned integer produces the expected order.
|Decimal value||Two’s complement signed integer||With first bit flipped||Resulting unsigned integer decimal value|
|0||0000 0000||1000 0000||128|
|1||0000 0001||1000 0001||129|
|2||0000 0010||1000 0010||130|
|126||0111 1110||1111 1110||254|
|127||0111 1111||1111 1111||255|
|-128||1000 0000||0000 0000||0|
|-127||1000 0001||0000 0001||1|
|-126||1000 0010||0000 0010||2|
|-2||1111 1110||0111 1110||126|
|-1||1111 1111||0111 1111||127|
The original signed value is lost in the mapping, but that’s not a problem because we only require that the ordering property holds. The original signed value is still available on the record associated with this Z-address if and when we need it.
Floating point values
The IEEE 754 specification for binary floating-point arithmetic states the following:
“If two floating-point numbers in the same format are ordered (say, x < y), they are ordered the same way when their bits are reinterpreted as sign-magnitude integers.”
This means that when sorting, you can treat the floating-point values’ binary representations as though they were sign-magnitude integers (a description follows). If you can find a way to map sign-magnitude integers to a binary representation that’s lexicographically sortable, that same mapping will work for IEEE 754-compliant floating-point values.
Sign-magnitude representation uses the first bit of an integer to indicate whether its value is positive or negative. The remaining bits are used to indicate the magnitude of the number. The following table shows some example encodings.
This encoding gets you close to a usable ordering, but negative numbers are being treated as larger than positive numbers. Also, negative numbers with bigger magnitudes are being treated as larger than negative numbers with smaller magnitudes (for example, -2 is being treated as greater than -1). You can address this by doing the following:
- Flipping the first bit in the binary representation of the numbers that are positive.
- Flipping all the bits in the binary representation of the numbers that are negative.
- Interpreting the resulting value as an unsigned integer.
|Decimal value||Sign-magnitudesigned integer||With bits flipped||Resulting unsigned decimal value|
|0||0000 0000||1000 0000||128|
|1||0000 0001||1000 0001||129|
|2||0000 0010||1000 0010||130|
|-2||1000 0010||0111 1101||125|
|-1||1000 0001||0111 1110||126|
Voila! By applying this mapping function to the floating-point values in the index, you can align the values’ lexicographical and numerical orderings.
Fortunately, UTF-8 encoded text is already lexicographically sortable. Also, because UTF-8 is backward compatible with ASCII encoding, both ASCII strings and sequences of multibyte code points can be sorted as is and produce the expected ordering. However, if you want to include a text attribute in your Z-address schema, all the values for that attribute must be the same number of bytes. This means that you need a mapping function that can take a string of an arbitrary length and produce a fixed-length byte sequence that preserves the string’s lexicographical ordering.
Imagine that you want to add a text attribute to your Z-order index schema that always contains a single word from the English language. You decide to represent that attribute using 4 bytes. When encoded using UTF-8, some English words are shorter than 4 bytes, and many are longer.
To produce a 4-byte value for each word, follow these steps:
- If the word is exactly 4 bytes long, use the word’s UTF-8 representation as is.
- If the word is shorter than 4 bytes long, add bytes with a value of zero to the end of the word’s UTF-8 representation until it is exactly 4 bytes.
- If the word is longer than 4 bytes, use only the first 4 bytes of its UTF-8 representation.
Adding or removing bytes in this manner means that the 4-byte value that you end up with might no longer be a valid UTF-8 value. That’s okay—you only need to produce a binary representation that will have a meaningful lexicographical ordering.
|Word||UTF-8 bytes (hexadecimal)||Modified 4-byte value|
|car||63 61 72||63 61 72 00|
|cart||63 61 72 74||63 61 72 74|
|cartographer||63 61 72 74 6f 67 72 61 70 68 65 72||63 61 72 74|
|carton||63 61 72 74 6f 6e||63 61 72 74|
Notice that this approach can cause words with a common prefix to have the same 4-byte value. This is cause for caution. By truncating each UTF-8 string to 4 bytes, you lose a great deal of precision. As far as the index is concerned, there is no difference between the entries “cart,” “carton,” and “cartographer.”
This collision of 4-byte values won’t impact the correctness of your queries, but it does limit how helpful the index can be. For example, if you run a range query on your word attribute using a minimum value of “candy” and a maximum value of “cartographer,” the index will return all the words that appear in that range lexicographically. However, the index also will return any words that had the same 4-byte value as the minimum and maximum words themselves. That means that the index’s result set will include the word “candor” (which has the same 4-byte value as the query’s minimum value, “candy”) and “carton” (which has the same 4-byte value as the query’s maximum value, “cartographer”).
The query’s FilterExpression will prevent these erroneous values from being returned by the API, but this filtering happens after the values have already been read from the database. Though the client running the query will get correct results, the database will have consumed some read capacity units accessing those incorrect values.
Most of the time, this is not a big problem. If the text that is stored has a relatively even distribution of prefixes, the number of value collisions should be manageable. However, imagine a table that includes a
referrer_url attribute in its index. Almost all of that attribute’s values will begin with
http://www or similar, leading to widespread collisions. In this case, the index would be nearly useless—you would consume large amounts of read capacity for each query no matter how precise your bounds on
It is worth mentioning that a number of other challenges come with Unicode text that are not specific to Z-order indexing. The same text can often be encoded in a variety of ways, potentially leading to what appears to be the same string of text appearing in different places in your table. To learn more about this concept, see Unicode normalization forms.
You can represent time stamps using a variety of encodings. When you write time stamps as text, the ISO 8601 date and time interchange format is designed to be lexicographically sortable when using a consistent format and precision. For example, if you always write the date using the format
YYYY-MM-DD, the values are always lexicographically sortable. However, if some of your dates are written in the format
YYYY-MM-DD, and others are written using week dates (for example,
YYYY-Www-D) or ordinal dates (
YYYY-DDD), trying to sort them will produce unpredictable results.
Using epoch time stamps—the number of milliseconds since January 1, 1970—is also an option. When epoch time stamps are represented as unsigned integers, no special mapping logic is required.
In this post, we explored the process of creating a schema for your index. We looked at how to decide which attributes to include in your schema, how your index’s schema impacts query performance, and how to work with a variety of data types. Specifically, we showed the following:
- You can easily estimate how efficiently a query will execute on a given Z-order index by computing the percentage of the total search space that the query eliminates.
- The attributes that you include in Z-order indexes should be both frequently defined and highly selective.
- When using a binary sort key, DynamoDB sorts the keys lexicographically as unsigned bytes.
- If the binary representation of each attribute that is included in your Z-addresses can be sorted lexicographically, the Z-addresses themselves can be sorted lexicographically, too.
- You can use a variety of data types in an index as long as the binary representations’ lexicographical order is the same as the data types’ natural order.
About the Author
Zack Slayton is a software development engineer at Amazon. He works on the open source Ion data serialization format, which was built to address rapid development, decoupling, and efficiency challenges associated with large-scale, service-oriented architectures.