AWS Database Blog

New Amazon DocumentDB (with MongoDB compatibility) aggregation pipeline operators: $objectToArray, $arrayToObject, $slice, $mod, and $range

Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. You can use the same MongoDB application code, drivers, and tools as you do today to run, manage, and scale workloads on Amazon DocumentDB without having to worry about managing the underlying infrastructure.

Today, Amazon DocumentDB has added support for five additional aggregation pipeline capabilities that allow you to compose powerful aggregations over your documents. The new capabilities include the $objectToArray, $arrayToObject, $slice, $mod, and $range aggregation pipeline operators.

We are constantly listening to our customers to build the capabilities that they want the most, and improve compatibility for the capabilities they actually use. See the documentation for more about the supported MongoDB APIs and aggregation pipeline capabilities for Amazon DocumentDB.

In this blog post, I introduce you to some of these new aggregation operators using common use cases to give you a quick start guide so that you can start using these capabilities in Amazon DocumentDB today.

Array aggregation operators

Amazon DocumentDB enables you to have documents with fields that have an array data type. Being able to query and manipulate arrays natively within the query language of the database can increase performance and simplify your application code by pushing the array manipulation down to the database. In this section, I show you how to use the four new array aggregation operators ($arrayToObject, $slice, $objectToArray, $range).

$objectToArray

The $objectToArray aggregation operator converts an object (or document) into an array. The input to the operator is a document and the output consists of an array element for each field-value pair in the input document.

To understand how $objectToArray works, I use an example dataset that tracks the inventory of video tapes for a fictious chain of video rental stores in Iowa. Although I initially modeled my data to categorize store inventory by a particular video title, there are times when I want to know which store has the greatest inventory of videos. To answer that question, I am going to use the $objectToArray aggregation operator.

Input:

Each document in the example dataset is a distinct video title and contains an embedded document that tracks the inventory for each video rental store location.

db.videos.insertMany([
{ "_id":1, "name":"Live Soft", "inventory": {"Des Moines": 1000, "Ames" : 500}},
{ "_id":2, "name":"Top Pilot", "inventory": {"Mason City": 250, "Des Moines": 1000}},
{ "_id":3, "name":"Romancing the Rock", "inventory": {"Mason City": 250, "Ames" : 500}},
{ "_id":4, "name":"Bravemind", "inventory": {"Mason City": 250, "Des Moines": 1000, "Ames" : 500}}
])

The query below utilizes multiple aggregations stages to answer the question that I posed, such as which store has the greatest inventory of videos. In the first stage, I use $objectToArray to convert the inventory document into an array so that I can more easily aggregate the store inventory. Below is the output of the first stage. As you can see, my document is now an array of key-value pairs.

Query:

db.videos.aggregate([
   { $project: {videos: { $objectToArray: "$inventory" }}}
])

Result:

{ "_id" : 1, "videos" : [ { "k" : "Des Moines", "v" : 1000 }, { "k" : "Ames", "v" : 500 } ] }
{ "_id" : 2, "videos" : [ { "k" : "Mason City", "v" : 250 }, { "k" : "Des Moines", "v" : 1000 } ] }
{ "_id" : 3, "videos" : [ { "k" : "Mason City", "v" : 250 }, { "k" : "Ames", "v" : 500 } ] }
{ "_id" : 4, "videos" : [ { "k" : "Mason City", "v" : 250 }, { "k" : "Des Moines", "v" : 1000 }, { "k" : "Ames", "v" : 500 } ] }

Now to answer the original question of which store has the greatest inventory of videos, I’m going to $unwind the arrays from the first stage, group the arrays by the city (such as key or “k”) and sum the quantity (such as value or “v”). Then, to return the store that has the greatest inventory, in my next stage I’m going to perform a descending $sort of the total value and use a final $limit stage to return only the top result.

Query:

db.videos.aggregate(
   [{ $project: {videos: { $objectToArray: "$inventory" }}},
    { $unwind: "$videos" },
    { $group: { _id: "$videos.k", total: { $sum: "$videos.v" } } },
    { $sort: {"total": -1}},
    { $limit : 1 }
   ]
)

Result:

{ "_id" : "Des Moines", "total" : 3000 }

The final result shows that Des Moines has the greatest inventory of videos with 3000. If I wanted to see the inventory for all stores, I would omit the final $limit stage from my aggregation query.

$ArrayToObject

Similar to $ObjectToArray, the $ArrayToObject aggregation operator converts an array of key-value pairs into a single document. As an input, $ArrayToObject expects that the array is already represented as one or more key-value pairs. For example, as you saw in the previous example:

“videos" : [ { "k" : "Des Moines", "v" : 1000 }, { "k" : "Ames", "v" : 500 } ]

The $arrayToObject operator is the reflexive of the $objectToArray operator. I can use $arrayToObject to take our output from the query above and return it back in its original form.

Query:

db.videos.aggregate(
   [ { $project: {name:1, videos: { $objectToArray: "$inventory" }}},
     { $project: {name:1, inventory: { $arrayToObject: "$videos" }}}
   ]
)

Result:

{ "_id" : 1, "name" : "Live Soft", "inventory" : { "Des Moines" : 1000, "Ames" : 500 } }
{ "_id" : 2, "name" : "Top Pilot", "inventory" : { "Mason City" : 250, "Des Moines" : 1000 } }
{ "_id" : 3, "name" : "Romancing the Rock", "inventory" : { "Mason City" : 250, "Ames" : 500 } }
{ "_id" : 4, "name" : "Bravemind", "inventory" : { "Mason City" : 250, "Des Moines" : 1000, "Ames" : 500 } }

Additionally, I can take arrays that are organized as key-value pairs and turn them into a document. Below is a sample dataset of a fishing report from a day out on Odell Lake.

Input:

db.fish.insertMany([
{ "_id" : 1, "report": [{"k":"Whitefish", "v":10}]},
{ "_id" : 2, "report": [{"k": "Dolly Varden", "v": 2}]},
{ "_id" : 3, "report": [{"k": "Rainbox Trout", "v": 33}]},
{ "_id" : 4, "report": [{"k": "Blueback Salmon", "v": 9}]}
])

With the $arrayToObject operator, you convert the array of key-value pairs of what fish were caught and how many into a single document.

Query:

db.fish.aggregate(
	[
     { $project: {caught: { $arrayToObject: "$report" }}}
   ]
)

Result:

{ "_id" : 1, "caught" : { "Whitefish" : 10 } }
{ "_id" : 2, "caught" : { "Dolly Varden" : 2 } }
{ "_id" : 3, "caught" : { "Rainbox Trout" : 33 } }
{ "_id" : 4, "caught" : { "Blueback Salmon" : 9 } }

$slice

The $slice aggregation operator enables you return a subset of an array by either traversing the array from the beginning or the end of the array. To illustrate the utility of $slice, consider the dataset below that contains the favorite sweets for a handful of chefs.

Input:

db.sweets.insertMany([
{ "_id" : 1, "name" : "Alvin", favorites: [ "chocolate", "cake", "toffee", "beignets" ] },
{ "_id" : 2, "name" : "Tom", favorites: [ "donuts", "pudding", "pie" ] },
{ "_id" : 3, "name" : "Jessica", favorites: [ "fudge", "smores", "pudding", "cupcakes" ] },
{ "_id" : 4, "name" : "Rachel", favorites: [ "ice cream" ] }
])

If I only want to return the first two favorites for each of the chefs, I can use $slice in the following aggregation query:

Query:

db.sweets.aggregate([
   { $project: { name: 1, threeFavorites: { $slice: [ "$favorites", 2] } } }
])

Result:

{ "_id" : 1, "name" : "Alvin", "threeFavorites" : [ "chocolate", "cake" ] }
{ "_id" : 2, "name" : "Tom", "threeFavorites" : [ "donuts", "pudding" ] }
{ "_id" : 3, "name" : "Jessica", "threeFavorites" : [ "fudge", "smores" ] }
{ "_id" : 4, "name" : "Rachel", "threeFavorites" : [ "ice cream" ] }

The output of the query yields at most two items for each of the chef’s favorite sweets. Notice that the query results started from the beginning of the array and selected the first two items. If you want to select the last two items in an array, you can use a negative number for the second parameter to specify starting with the last element in the array and traverse right to left. For example, below is the same query as above but with the ordering of the array traversal for $slice reversed.

Query:

db.sweets.aggregate([
   { $project: { name: 1, threeFavorites: { $slice: [ "$favorites", -2] } } }
])

Result:

{ "_id" : 1, "name" : "Alvin", "threeFavorites" : [ "toffee", "beignets" ] }
{ "_id" : 2, "name" : "Tom", "threeFavorites" : [ "pudding", "pie" ] }
{ "_id" : 3, "name" : "Jessica", "threeFavorites" : [ "pudding", "cupcakes" ] }
{ "_id" : 4, "name" : "Rachel", "threeFavorites" : [ "ice cream" ] }

As you can see from the results, the last two items in the array were selected. Note, for arrays with less than two items, only a single value is returned.

$range

The $range aggregation operator enables you to create an array of sequenced numbers. The inputs to the aggregation operator are a starting value and end value for the desired range of numbers and an optional non-zero incremental value. To highlight these capabilities, I use a series of examples of how to space out aid stations for long-distance bike races. Consider the following races and their respective distance. As the race director, I want to ensure that the riders have a water station every 20 miles so that they can get water and stay hydrated. I use the $range operator to indicate at what mile markers I should place the aid station.

Input:

db.races.insertMany([
{ _id: 0, race: "STP", distance: 206 },
{ _id: 1, race: "RSVP", distance: 160 },
{ _id: 2, race: "Chilly Hilly", distance: 33 },
{ _id: 3, race: "Flying Wheels", distance: 100 },
])

Query:

db.races.aggregate([
	{$project: {race: 1, "waterStations": { $range: [ 20, "$distance", 20 ] }}}
])

Result:

{ "_id" : 0, "race" : "STP", "waterStations" : [ 20, 40, 60, 80, 100, 120, 140, 160, 180, 200 ] }
{ "_id" : 1, "race" : "RSVP", "waterStations" : [ 20, 40, 60, 80, 100, 120, 140 ] }
{ "_id" : 2, "race" : "Chilly Hilly", "waterStations" : [ 20 ] }
{ "_id" : 3, "race" : "Flying Wheels", "waterStations" : [ 20, 40, 60, 80 ] }

From the result, I can now see for what mile markers I need to place and staff water stations. If I want to know how many total water stations I am going to need per race, I could add another field to my document using the $addFields aggregation stage as well as use the $size aggregation operators.

Query:

db.races.aggregate([
	{$project: {race: 1, "waterStations": { $range: [ 20, "$distance", 20 ] }}},
	{$addFields: {totalStations: {$size:"$waterStations"}}}
])

Result:

{ "_id" : 0, "race" : "STP", "waterStations" : [ 20, 40, 60, 80, 100, 120, 140, 160, 180, 200 ], "totalStations" : 10 }
{ "_id" : 1, "race" : "RSVP", "waterStations" : [ 20, 40, 60, 80, 100, 120, 140 ], "totalStations" : 7 }
{ "_id" : 2, "race" : "Chilly Hilly", "waterStations" : [ 20 ], "totalStations" : 1 }
{ "_id" : 3, "race" : "Flying Wheels", "waterStations" : [ 20, 40, 60, 80 ], "totalStations" : 4 }

The result is another field in the document entitled “totalStations” that indicates the size of the array, or in other words, the amount of water stations needed.

Arithmetic operator

In this release, we also added $mod, which has been one of the most requested arithmetic operators from our customers.

$mod

The $mod arithmetic operator enables you to perform modular math. Commons use cases for $mod include determining when a number is odd or even (even numbers % 2 return 0) or distributing people or items amongst a finite number of groups. For the dataset below, I want to determine how many leftover widgets I have if I ship my widgets in packages of 100.

Input:

db.widgets.insertMany([
{ "_id" : 1, "widget" : "A", "count" : 80372},
{ "_id" : 2, "widget" : "B", "count" : 409282},
{ "_id" : 3, "widget" : "C", "count" : 60739}
])

In the aggregation query below, I also use the $addFields aggregation stage to simply add the remainder of the modulo math operation to the existing document as an added field.

Query:

db.widgets.aggregate(
   [
     { $addFields: {leftOver: { $mod: [ "$count", 100 ] } } }
   ]
)

Result:

{ "_id" : 1, "widget" : "A", "count" : 80372, "leftOver" : 72 }
{ "_id" : 2, "widget" : "B", "count" : 409282, "leftOver" : 82 }
{ "_id" : 3, "widget" : "C", "count" : 60739, "leftOver" : 39 }

From the result, you can see that the leftOver field indicates the remainder of the count % 100 math.

Summary

We continue to work backward from our customers and build the capabilities that they need. In this release, we added five new aggregation pipeline capabilities that include $objectToArray, $arrayToObject, $slice, $mod, and $range.

To get started with Amazon DocumentDB, you can use the Amazon DocumentDB getting started guide, or watch the following video. You can then use the same application code, drivers, and tools that you use with MongoDB today to start developing against Amazon DocumentDB. To learn more, see the Amazon DocumentDB product page. To learn more about migrations, please see the migration guide and learn how FINRA migrated to Amazon DocumentDB.

 


About the Author

 

Joseph Idziorek is a Principal Product Manager at Amazon Web Services.