AWS Database Blog

Scale your connections with Amazon DocumentDB using mongobetween

Amazon DocumentDB (with MongoDB compatibility) is a fully managed native JSON document database that makes it easy and cost-effective to operate critical document workloads at virtually any scale without managing infrastructure. You can use the same application code written using MongoDB API (versions 3.6, 4.0, and 5.0) compatible drivers, and tools to run, manage, and scale workloads on Amazon DocumentDB without worrying about managing the underlying infrastructure. As a document database, Amazon DocumentDB makes it straightforward to store, query, and index JSON data.

Modern applications built for serverless deployments using AWS Lambda or AWS Fargate, or for containerized deployments using Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS) are built to scale on-demand. During scale-up events, these applications may try to open a large number of connections to your Amazon DocumentDB cluster. During high variation (spiky) workload periods, these applications open and close database connections at a high rate.

Each open connection consumes memory and CPU resources on the Amazon DocumentDB instance. Each instance has a connection limit that scales with instance size. Each of the instances, primary and replicas, have individual connection limits. It can be challenging to ensure that application scale-up events don’t breach this connection limit. After the connection limit has been reached, Amazon DocumentDB rejects any further connection attempts and the application will encounter connection exceptions. Sustained frequent opening and closing of connections during spiky workload periods also results in performance and latency fluctuations from the Amazon DocumentDB cluster due to pressure on instance resources like CPU and memory.

To better manage and stabilize connections to Amazon DocumentDB for such workloads, mongobetween is a lightweight MongoDB connection pooler written in Golang. Its primary function is to handle a large number of incoming connections and multiplex them across a smaller connection pool to one or more Amazon DocumentDB clusters.

In this post, we discuss how to configure mongobetween to scale the connections beyond the connection limit of an Amazon DocumentDB instance.

Solution overview

In this post, we use mongobetween as a connection pooler on an Amazon Elastic Compute Cloud (Amazon EC2) instance. mongobetween is configured with a fixed connection pool size to limit the number of connections to Amazon DocumentDB. Applications connect to mongobetween instead of directly to Amazon DocumentDB when it requires connections higher than the DocumentDB instance connection limits. mongobetween acts as a connection multiplexer, handling the many incoming connections from the applications and efficiently managing the smaller pool of connections to Amazon DocumentDB.

The following diagram illustrates the architecture for this setup.

Prerequisites

Refer to the Prerequisites section in the sample code repository and complete the steps in the Setup mongobetween in the Amazon EC2 Instance section.

Create the test environment

To create the test environment, refer to Create the test environment in the sample code.

Run the sample application

We discuss two methods to connect to Amazon DocumentDB: directly and using mongobetween.

Connect to Amazon DocumentDB directly

In this approach, as we increase the number of processes in the Python script, the number of connections to the Amazon DocumentDB instances keep increasing and eventually reach the limit of the maximum number of connections as per the instance type. Therefore, the application has limited scalability. To simulate the scenario where the application attempts to open connections within the connection limits of Amazon DocumentDB instances refer to section Run Python script with direct connection to DocumentDB cluster with 200 concurrent processes in the sample code . To simulate the scenario where Amazon DocumentDB starts rejecting connection attempts beyond its limits refer to section Run a Python script with a direct connection to a DocumentDB cluster with 900 concurrent processes in the sample code.

Connect to Amazon DocumentDB using mongobetween

In this approach, as we increase the number of processes in the Python script, the number of open connections to the Amazon DocumentDB instances stays constant because it’s controlled by the mongobetween proxy. Therefore, the application can keep scaling connections to the proxy and mongobetween distributes and assigns a fixed set of open connections to incoming requests. To simulate the scenario where the application connects to mongobetween proxy instead of directly connecting to Amazon DocumentDB refer to section Run test script with mongobeetween connection pooling in the sample code.

High availability deployment options

The previous section demonstrates how a single mongobetween process works , multiplexing incoming connections from our application to Amazon DocumentDB. However, in production environments, you need high availability to have no single point of failure, and have multiple mongobetween instances running to meet the scale of your workload.

In this section, we discuss two common deployment approaches to make mongobetween highly available.

Sidecar deployment approach

You can run mongobetween as a sidecar to your containerized application. This will have the minimum latency added when your application code communicates to mongobetween, as compared to the approach that we discuss next. This approach also doesn’t need any complex networking setup between your application and mongobetween. Each time the application scales, it will open up another set of connections to Amazon DocumentDB via the mongobetween sidecar. Therefore, you have to be very careful about the number of outbound connections from mongobetween exceeding the Amazon DocumentDB connection limits in case of a scaling event. In this mode of deployment, this may limit the scaling capacity of your application. The following diagram illustrates this workflow.

Service-based deployment approach

The other method is to run mongobetween containers as a standalone service. Application containers connect to the mongobetween service, which routes the connection to one of mongobetween pods to connect to Amazon DocumentDB. With this approach, the application can scale independent of mongobetween and Amazon DocumentDB connection limits. The following diagram illustrates this workflow.

Latency considerations

Mongobetween only acts a proxy and accumulates incoming connections without rejecting them when a limit is exceeded. However, it can only process as many incoming requests from the application as the outgoing connections it has made to Amazon DocumentDB by mongobetween, the rest of the requests wait in the queue for a connection to be freed up. Therefore, the application needs to be designed to handle additional latency than it would normally have when the code connects directly to Amazon DocumentDB. As a result, connection timeouts need to be adjusted. Exponential backoff-based exception handling and retries is a good approach in the application design.

Conclusion

In this post, we showed how you can configure mongobetween to scale connections beyond the connection limit of an Amazon DocumentDB instance. mongobetween acts as a connection multiplexer, handling the many incoming connections from the applications and efficiently managing the smaller pool of connections to Amazon DocumentDB. We also talked about high availability deployment options for mongobetween in a production environment.

If you have any feedback of questions, leave them in the comments section.


About the authors

Sourav Biswas is a Senior Amazon DocumentDB Specialist Solutions Architect at AWS. He has been helping Amazon DocumentDB customers successfully adopt the service and implement best practices around it. Before joining AWS, he worked extensively as an application developer and solutions architect for various NoSQL vendors.

Anshu VajpayeeAnshu Vajpayee is a Senior Amazon DocumentDB Specialist Solutions Architect at AWS. He has been helping customers adopt NoSQL databases and modernize applications using Amazon DocumentDB. Before joining AWS, he worked extensively with relational and NoSQL databases.