AWS Mobile Blog

Geo Library for Amazon DynamoDB – Part 1: Table Structure

Geo Library for Amazon DynamoDB supports geospatial indexing on Amazon DynamoDB datasets. The library takes care of managing Geohash indexes. You can use these indexes for fast and efficient execution of location-based queries over DynamoDB items representing points of interest (latitude/longitude pairs). Some features of this library are:

  • Life Cycle Operations: Create, retrieve, update, and delete geospatial data items.
  • Query Support: Box queries return items that fall within a pair of geo points that define a rectangle as projected on a sphere. Radius queries return items that fall within a given distance from a geo point.
  • Easy Integration: This library extends the AWS SDK for Java, making it easy to use from your existing Java applications on AWS.

To help you get started, we have added an AWS Elastic Beanstalk application and a sample iOS project which you can get from GitHub. You can follow the Getting Started section of README.md and run the sample apps to find out what Geo Library for Amazon DynamoDB offers.

Geo Library for Amazon DynamoDB automatically generates values for Geohash, GeoJSON, Hash Key, and Range Key attributes in your table and uses them for querying. When you run the sample app, you will see those attributes in your table. In this post, I will briefly explain what they are and how the library uses them.

Geohash

When Geo points are inserted into a DynamoDB table, a Geohash is computed and used to map the data record to the correct grid cell. The library stores each item’s Geohash as an item attribute. The hash preserves the proximity of nearby points and makes for efficient retrieval; it is stored as a local secondary index on the items.

Conceptually, Geohash is computed as follows:

  • Divide the planet earth into six cells, like the six faces of a cube – e.g., Cell A, Cell B, Cell C, Cell D, Cell E, and Cell F.

image

  • Each cell has four child cells. In this example, Cell A has four children: 1, 2, 3, and 4.

Cell A

image

  • Each child cell also has four child cells. For instance, Cell 2 has these child cells: 21, 22, 23, and 24.

Cell A

image

  • Cell 22 also has four child cells: 221, 222, 223, and 224.

Cell A

image

  • The Geohash of the red circle in the picture is computed as A224.

Cell A

image

  • Geohash roughly preserves the proximity of Geo points since close points likely share the same prefix. For example, these three red dots, A221, A223, and A224, share A22 as a prefix.

Cell A

image

The library uses 63 bits to represent the Geohash, and the hash is stored as 64-bit long value in the table. This attribute is indexed using Local Secondary Indexes and is used for query requests.

GeoJSON

The GeoJSON attribute contains a string representation of a Geo point, a latitude and longitude pair, in GeoJSON format. You can retrieve the latitude and longitude of a Geo point by parsing this JSON string. Here is an example:

{ "type": "Point", "coordinates": [47.61121, -122.31846] }

Because Geo Library for Amazon DynamoDB currently supports only point data, type is always Point. coordinates will be the latitude and longitude pair as an array.

Hash Key

By default, the hash key is the first 6 digits of Geohash. For instance, if your Geohash is 6093522776912656819, the hash key will be 609352. For -6093522776912656819, the hash key will be -609352. The length of hash key is configurable.

To achieve the full amount of request throughput you have provisioned for a table, you should keep your workload spread evenly across the hash key values. So, a longer hash key is better for distributing the data across multiple hash key spaces. For more details, please read Design for Uniform Data Access Across Items in Your Tables.

On the other hand, a longer hash key means you potentially have to call more Query requests. For example, suppose you want to retrieve two points, 6093522776912656819 and 6093523776912656819.

Hash Key Length 6

One query can retrieve both of the points.

Query Hash Key Geohash Query Condition
Query 1 609352 BETWEEN 6093520000000000000 and 6093529999999999999

Hash Key Length 7

You need two queries to retrieve both of the points.

Query Hash Key Geohash Query Condition
Query 1 6093522 BETWEEN 6093522000000000000 and 6093522999999999999
Query 2 6093523 BETWEEN 6093523000000000000 and 6093523999999999999

In Amazon DynamoDB, an item collection is any group of items that have the same hash key, in a table and all of its local secondary indexes, and the maximum size of any item collection is 10 GB. So your hash key needs to be long enough so that it can store all of your data. Read Watch for Expanding Item Collections for more details.

To summarize;

Longer Hash Key Shorter Hash Key
Hash Key Distribution [ + ] More distributed [ – ] Less distributed
Query Efficiency [ – ] More Query requests [ + ] Less Query requests
Item Collection Limitation   [ ! ] Be aware of 10GB limitation

The optimal hash key length depends on your dataset and query patterns, so please thoroughly investigate your expected dataset and figure out the best hash key length. Once you set the hash key length, it’s not easy to change.

Range Key

Geo Library for Amazon DynamoDB uses a range key to uniquely identify a row in a table. This is a user-defined attribute, so you can choose any method to generate a unique key.

In Amazon DynamoDB, the combination of hash key and range key must guarantee uniqueness; however, for this library, we strongly recommend that the range key itself should guarantee the uniqueness. This is because the hash key is automatically generated in order to distribute items throughout multiple partitions, and developers have limited control over the actual value of the hash key. It makes your table architecture simpler when your range key can uniquely identify a row.

In the coming weeks, we will be publishing additional blog posts on how to use Geo Library for Amazon DynamoDB in detail. As always, please leave a comment below if you have questions.

Further Reading