AWS Database Blog
Creating a REST API for Amazon DocumentDB (with MongoDB compatibility) with Amazon API Gateway and AWS Lambda
Representational state transfer (REST) APIs are a common architectural style for distributed systems. They benefit from being stateless and therefore enable efficient scaling as workloads increase. These convenient—yet still powerful—APIs are often paired with database systems to give programmatic access to data managed in a database. One request that customers have expressed is to have a REST API for access to their Amazon DocumentDB (with MongoDB compatibility) database, which is what this post discusses.
Amazon DocumentDB is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data. The primary mechanism that users use to interact with Amazon DocumentDB is via the MongoDB drivers, which provide a stateful, session-based API.
Providing simple HTTP-based access to Amazon DocumentDB allows for the addition of document data to webpages, other services and microservices, and other applications needing database access.
In this post, I demonstrate how to build a REST API for read and write access to Amazon DocumentDB by using Amazon API Gateway, AWS Lambda, and AWS Secrets Manager.
Solution overview
The REST API in this post can perform insert, update, delete, and read operations against Amazon DocumentDB collections. Access can be restricted to particular collections, all collections in a particular database, or all collections in all databases. To accomplish this goal, we use the following services:
- Amazon DocumentDB – Stores our data
- API Gateway – Exposes an HTTP REST API
- Lambda – Connects the API Gateway service to the database
- Secrets Manager – Stores the database credentials for use by our Lambda function
A discussion around best practices for securing the API endpoints is beyond the scope of this post, but for more information, see Controlling and managing access to a REST API in API Gateway. For this post, a simple username-password authentication is presented as another Lambda function. The following diagram illustrates the architecture of this solution.
I use the AWS Serverless Application Model (AWS SAM) to deploy this stack because it’s the preferred approach when developing serverless applications such as this one. The template and code are available in the GitHub repo.
In terms of functionality for the API, insert, update, delete, and find operations against data stored in the database are exposed.
Storing our data with Amazon DocumentDB (with MongoDB compatibility)
Amazon DocumentDB is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. To read more about the architecture, value proposition, and attributes of Amazon DocumentDB, see 12 things you should know about Amazon DocumentDB (with MongoDB compatibility). This post begins with an already existing Amazon DocumentDB cluster that we expose via a REST API to applications. If you don’t already have an Amazon DocumentDB cluster, see Getting Started with Amazon DocumentDB (with MongoDB compatibility).
Storing database credentials with Secrets Manager
As is common with applications deployed in AWS that connect to Amazon DocumentDB, including Lambda functions, we use Secrets Manager to store credentials to connect to Amazon DocumentDB. This way, you can grant permissions to access those credentials to a role, such as the role used to run a Lambda function. After the application, or Lambda function, retrieves the credentials from Secrets Manager, it can use those credentials to make a database connection to Amazon DocumentDB.
Exposing a REST API with API Gateway
For this post, we use API Gateway to expose a REST API to the world. API Gateway supports several APIs, REST being just one of the options, but the one on which I focus in this post.
API Gateway can define REST endpoints and define operations on those endpoints. For this discussion, an endpoint per collection is used and multiple operations on that endpoint are defined, specifically:
- GET – Corresponds to a read or find database operation
- PUT or POST – Corresponds to an insert database operation
- PATCH – Corresponds to an update database operation
- DELETE – Corresponds to a delete database operation
The parameters that are sent with those operations mimic the parameters in the corresponding database operation. For example, the GET operation mimics the find() operation in the MongoDB API. In the MongoDB API, you can specify five parameters to the find() operation:
- A filter to identify documents of interest, specified as a JSON object
- A projection to specify what fields should be returned, specified as a JSON object
- A sort to specify the order of the results, specified as a JSON object
- A limit to specify the number of results to return, specified as an integer
- A skip amount to specify how many results to skip before returning results, specified as an integer
Similarly, the PATCH operation mimics the update() operation in the MongoDB API, which takes two parameters:
- A filter to identify which documents should be updated, specified as a JSON object
- An update command to indicate what update should be applied, specified as a JSON object
- Updates include setting or unsetting a field value, incrementing or decrementing a field value, and so on
For more information about the MongoDB API, see the MongoDB documentation.
We use the path of the REST endpoint to specify the database and collection against which to operate. In API Gateway, you can specify the exact path, thereby specifying a particular database and collection, and the client calling the REST endpoint can’t change that. For example, API Gateway can expose a URL like http://<BASE_URL>/docdb/mydb/mycollection
that corresponds to accessing the mycollection
collection in the mydb
database.
In addition, in API Gateway, you can allow for path variables to be added to a particular REST endpoint. This allows you to expose an endpoint to a particular database, but allows the user to specify the collection. This enables access to all collections in a particular database. For example, API Gateway can expose a URL like http://<BASE_URL>/docdb/mydb
, which allows the caller to append a particular collection name to the URL, such as http://<BASE_URL>/docdb/mydb/another_collection
. This allows access to any collection, including another_collection
, inside the mydb
database.
You can extend this idea further and expose a REST endpoint and allow the client to specify both the database and the collection as path variables, thereby exposing all collections in all databases. For example, API Gateway can expose a URL like http://<BASE_URL>/docdb/
, which allows the caller to append both the database and the collection names to the URL, such as http://<BASE_URL>/docdb/other_db/other_collection
. This allows access to any collection, including other_collection
, inside any database, including the other_db
database.
Enforcing API Gateway security
From a security standpoint, as a best practice, access to these endpoints should be restricted via the security mechanisms in API Gateway. For more information, see Controlling and managing access to a REST API in API Gateway. API Gateway has several available authorization schemes, including Amazon Cognito, AWS Identity and Access Management (IAM), and Lambda functions. For the purposes of this post, I use a simple Lambda authorizer that compares the supplied username and password with static values (stored as environment variables in the Lambda function). I can choose to protect some or all of the API endpoints, and even protect the different endpoints differently (for example, with different username/password pairs, or different authorizers), but that is beyond the scope of this post.
Connecting API Gateway to the database with Lambda
The component doing the heavy lifting in this solution is Lambda. We use a single, simple Lambda function to perform all the various operations exposed: GET (for find), PUT or POST (for insert), PATCH (for update), and DELETE (for delete).
When API Gateway calls the Lambda function, the incoming event contains several pieces of metadata, including the REST method that was invoked. This function uses that field to determine which subfunction to call and return the results. Additionally, the API path that was invoked is also sent as part of the event. This path can be used to determine the database and collection that are to be queried.
It’s worth talking about a few good practices for implementing this Lambda function.
Using Lambda layers
Lambda has a mechanism by which you can package up some commonly used libraries or packages so that you don’t need to package them up with each Lambda function that you deploy. For example, I do a lot of work with Lambda functions in Python connecting to Amazon DocumentDB, so many of my functions need the MongoDB drivers to make the connection. Because I’m using the best practice configuration, which uses SSL to communicate with the cluster, my functions also need the certificate file for connecting to Amazon DocumentDB.
You can package up these dependencies into a .zip file and create a Lambda layer. Then, when you create a Lambda function, like the CRUD operations function for this REST API, you can add the layer to your function to bring in those dependencies. This greatly simplifies deployment of Lambda functions. You can now easily compose Lambda functions directly on the AWS Management Console, because the dependencies are packaged up already. Additionally, if your Lambda function is simple, you can include the source code for our function directly in an AWS CloudFormation template, simplifying automated deployments as well.
For this post, I create a single Lambda layer that includes the MongoDB Python driver and the Amazon DocumentDB certificate file. When added to a Python Lambda function, the resources are available under the /opt directory.
Connecting to Amazon DocumentDB
If you make the connection to Amazon DocumentDB inside the handler for the Lambda function, you have to go through the process of connecting to the database on every call to the Lambda function. This is a wasteful and resource-intensive approach to connections.
As with other database services accessed from Lambda functions, the best practice is to make the connection outside of the handler itself. When AWS reuses the environment for another Lambda invocation, the connection is already made. For this post, I store the connection in a global variable and use that connection, unless it’s uninitialized, in which case I call a database connection subfunction.
Enforcing Lambda security
As stated earlier, we store our credentials for our Amazon DocumentDB cluster in Secrets Manager. I grant the permission to retrieve this secret to the role that the Lambda function uses. It’s a simple operation to add a subfunction to the Lambda function that retrieves those credentials, which you can then use to connect to the database.
Using this pattern is a nice way to not expose usernames or passwords in code or in the configuration of the Lambda function.
Creating a REST API in an Amazon Document DB cluster
Now, let’s walk through the steps to create a REST API for your Amazon DocumentDB cluster. For this example, I assume you currently have an Amazon DocumentDB cluster, and know the username and password for a user that can query collections in that cluster. If you don’t currently have an Amazon DocumentDB cluster up and running, use the following CloudFormation template to launch one.
To deploy this REST API, I use AWS SAM and the following repository. The important files in this repository are the template file, template.yaml
, and the Lambda source code located in the docdb_rest
folder, specifically app.py
and auth.py
.
- Clone the repository with the template and code:
- You need to build the .zip file for the Lambda layer that holds the database driver and certificate authority file to connect to Amazon DocumentDB. To do this, run the following command:
- Now you’re ready to build the serverless application via the sam command:
- When that is complete, deploy the serverless application:
- You need to answer several questions from the command line:
- The stack name
- Which Region to deploy
- A prefix to be prepended to resources created by this stack (for easy identification on the console)
- The identifier for the Amazon DocumentDB cluster
- The username for accessing the Amazon DocumentDB cluster
- The password for accessing the Amazon DocumentDB cluster
- A VPC subnet with networking access to the Amazon DocumentDB cluster
- A security group with access to the Amazon DocumentDB cluster
- The username to use to protect the REST API
- The password to use to protect the REST API
- Optionally, choose to confirm changes before deploying.
- You need to allow the AWS SAM CLI to create an IAM role.
- Optionally, choose to save the arguments to a configuration file, and choose a configuration file name and a configuration environment.
- When prompted, choose to deploy this change set.
- When the stack has finished deploying, you see a list of the resources and a notice:
- Make note of the
APIRoot
output printed at the successful deployment. You use this to test the API in the next step.
Testing the API
Now you can test our API by calling your REST endpoints via curl. To do so, you need to get the URL, which is the APIRoot
output from the deployment. You can also retrieve this information from API Gateway.
- Let’s set environment variables to hold the root URL for the API, as well as the username and password to access the API:
Now we can issue some HTTP commands.
- Insert some data via
PUT
:
- Insert some data via POST:
- Retrieve all the data via GET:
- Retrieve just the joe document via GET:
- Retrieve just the joe document but only project the name field via GET:
- Update the jason document via PATCH:
- Delete the jason document via DELETE:
See the README file in the repository for more information on the REST API syntax implemented in the Lambda function.
Limiting access to certain collections
These commands were issued against the generic endpoint that interprets the database and collection from the URL. The API that was implemented has two other endpoints: one that specifies a fixed database (demodb
) but allows access to all collections in that database, and one that specifies a specific collection (democollection
) in a specific database (demodb
). If only the endpoint to a specific collection is exposed, then only that collection can be accessed via REST commands. This allows you to grant broad or narrow access to the databases and collections as suits your needs.
Cleaning up
You can delete the resources created in this post by deleting the stack via the AWS CloudFormation console or the AWS Command Line Interface (AWS CLI). Your Amazon DocumentDB cluster is not deleted by this operation.
Conclusion
In this post, I demonstrated how to create a REST API to gain read and write access to collections in an Amazon DocumentDB database. Amazon DocumentDB access is only available within an Amazon VPC, but you can access this REST API outside of the VPC. I also showed how to create a single Lambda function that serves as the bridge between the API Gateway REST API and the Amazon DocumentDB database, and supports insert, update, delete, and read operations. Finally, I showed how to use Lambda layers to simplify Lambda function development, and how to safely store database credentials in Secrets Manager for use by our Lambda function.
For more information about recent launches and blog posts, see Amazon DocumentDB (with MongoDB compatibility) resources.
About the Author
Brian Hess is a Senior Solution Architect Specialist for Amazon DocumentDB (with MongoDB compatibility) at AWS. He has been in the data and analytics space for over 20 years and has extensive experience with relational and NoSQL databases.