AWS Database Blog
New Amazon DocumentDB (with MongoDB compatibility) aggregation pipeline operators: $objectToArray, $arrayToObject, $slice, $mod, and $range
Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. You can use the same MongoDB application code, drivers, and tools as you do today to run, manage, and scale workloads on Amazon DocumentDB without having to worry about managing the underlying infrastructure.
Today, Amazon DocumentDB has added support for five additional aggregation pipeline capabilities that allow you to compose powerful aggregations over your documents. The new capabilities include the $objectToArray
, $arrayToObject
, $slice
, $mod
, and $range
aggregation pipeline operators.
We are constantly listening to our customers to build the capabilities that they want the most, and improve compatibility for the capabilities they actually use. See the documentation for more about the supported MongoDB APIs and aggregation pipeline capabilities for Amazon DocumentDB.
In this blog post, I introduce you to some of these new aggregation operators using common use cases to give you a quick start guide so that you can start using these capabilities in Amazon DocumentDB today.
Array aggregation operators
Amazon DocumentDB enables you to have documents with fields that have an array data type. Being able to query and manipulate arrays natively within the query language of the database can increase performance and simplify your application code by pushing the array manipulation down to the database. In this section, I show you how to use the four new array aggregation operators ($arrayToObject
, $slice
, $objectToArray
, $range
).
$objectToArray
The $objectToArray
aggregation operator converts an object (or document) into an array. The input to the operator is a document and the output consists of an array element for each field-value pair in the input document.
To understand how $objectToArray
works, I use an example dataset that tracks the inventory of video tapes for a fictious chain of video rental stores in Iowa. Although I initially modeled my data to categorize store inventory by a particular video title, there are times when I want to know which store has the greatest inventory of videos. To answer that question, I am going to use the $objectToArray
aggregation operator.
Input:
Each document in the example dataset is a distinct video title and contains an embedded document that tracks the inventory for each video rental store location.
The query below utilizes multiple aggregations stages to answer the question that I posed, such as which store has the greatest inventory of videos. In the first stage, I use $objectToArray
to convert the inventory document into an array so that I can more easily aggregate the store inventory. Below is the output of the first stage. As you can see, my document is now an array of key-value pairs.
Query:
Result:
Now to answer the original question of which store has the greatest inventory of videos, I’m going to $unwind
the arrays from the first stage, group the arrays by the city (such as key or “k”) and sum the quantity (such as value or “v”). Then, to return the store that has the greatest inventory, in my next stage I’m going to perform a descending $sort
of the total value and use a final $limit
stage to return only the top result.
Query:
Result:
The final result shows that Des Moines has the greatest inventory of videos with 3000. If I wanted to see the inventory for all stores, I would omit the final $limit
stage from my aggregation query.
$ArrayToObject
Similar to $ObjectToArray
, the $ArrayToObject
aggregation operator converts an array of key-value pairs into a single document. As an input, $ArrayToObject
expects that the array is already represented as one or more key-value pairs. For example, as you saw in the previous example:
The $arrayToObject
operator is the reflexive of the $objectToArray
operator. I can use $arrayToObject
to take our output from the query above and return it back in its original form.
Query:
Result:
Additionally, I can take arrays that are organized as key-value pairs and turn them into a document. Below is a sample dataset of a fishing report from a day out on Odell Lake.
Input:
With the $arrayToObject
operator, you convert the array of key-value pairs of what fish were caught and how many into a single document.
Query:
Result:
$slice
The $slice
aggregation operator enables you return a subset of an array by either traversing the array from the beginning or the end of the array. To illustrate the utility of $slice
, consider the dataset below that contains the favorite sweets for a handful of chefs.
Input:
Query:
Result:
The output of the query yields at most two items for each of the chef’s favorite sweets. Notice that the query results started from the beginning of the array and selected the first two items. If you want to select the last two items in an array, you can use a negative number for the second parameter to specify starting with the last element in the array and traverse right to left. For example, below is the same query as above but with the ordering of the array traversal for $slice
reversed.
Query:
Result:
As you can see from the results, the last two items in the array were selected. Note, for arrays with less than two items, only a single value is returned.
$range
The $range
aggregation operator enables you to create an array of sequenced numbers. The inputs to the aggregation operator are a starting value and end value for the desired range of numbers and an optional non-zero incremental value. To highlight these capabilities, I use a series of examples of how to space out aid stations for long-distance bike races. Consider the following races and their respective distance. As the race director, I want to ensure that the riders have a water station every 20 miles so that they can get water and stay hydrated. I use the $range
operator to indicate at what mile markers I should place the aid station.
Input:
Query:
Result:
From the result, I can now see for what mile markers I need to place and staff water stations. If I want to know how many total water stations I am going to need per race, I could add another field to my document using the $addFields
aggregation stage as well as use the $size
aggregation operators.
Query:
Result:
The result is another field in the document entitled “totalStations
” that indicates the size of the array, or in other words, the amount of water stations needed.
Arithmetic operator
In this release, we also added $mod
, which has been one of the most requested arithmetic operators from our customers.
$mod
The $mod
arithmetic operator enables you to perform modular math. Commons use cases for $mod
include determining when a number is odd or even (even numbers % 2 return 0) or distributing people or items amongst a finite number of groups. For the dataset below, I want to determine how many leftover widgets I have if I ship my widgets in packages of 100.
Input:
In the aggregation query below, I also use the $addFields
aggregation stage to simply add the remainder of the modulo math operation to the existing document as an added field.
Query:
Result:
From the result, you can see that the leftOver field indicates the remainder of the count % 100 math.
Summary
We continue to work backward from our customers and build the capabilities that they need. In this release, we added five new aggregation pipeline capabilities that include $objectToArray
, $arrayToObject
, $slice
, $mod
, and $range
.
To get started with Amazon DocumentDB, you can use the Amazon DocumentDB getting started guide, or watch the following video. You can then use the same application code, drivers, and tools that you use with MongoDB today to start developing against Amazon DocumentDB. To learn more, see the Amazon DocumentDB product page. To learn more about migrations, please see the migration guide and learn how FINRA migrated to Amazon DocumentDB.
About the Author
Joseph Idziorek is a Principal Product Manager at Amazon Web Services.