AWS Open Source Blog

How MongoDB and AWS Collaborated to Enable Running the Open Source MongoDB Kafka Connector in Managed Environments

Developers can use the open source MongoDB Connector for Apache Kafka with Amazon Managed Streaming for Apache Kafka (Amazon MSK) as described in this blog and the serverless Amazon MSK as described in this blog to connect to MongoDB Atlas over Amazon Private Link. While the connector is secure with JVM level certificates for encryption and authentication, customers had no option to incorporate their own custom certificates. This restriction meant that they were unable to exercise greater control over the trust chain. By allowing customers to supply their own custom certificates, the application could offer enhanced flexibility and adaptability to meet the specific security needs of each individual customer.

In this blog we describe a new Kafka connector functionality that allows customers to define certificates located in the connector configuration. The functionality gives them full control over how the connector uses the certificates allowing for introduction of custom certificates in the trust chain.

Kafka connector components

Apache Kafka is a distributed streaming platform originally created by LinkedIn and released as open source in 2011. Kafka is designed to handle large volumes of data and enables scalable, fault-tolerant, and high-throughput messaging between applications. A vast ecosystem of tools and integrations exist making Apache Kafka widely recognized as the de facto standard for streaming data processing.

Kafka Connect is an open source component that helps with integrating data sources such as MongoDB with the Apache Kafka ecosystem. This service provides an API for third-party connectors to interact with and exchange data between Kafka and their datasources. The MongoDB Kafka Connector is an open source connector written in Java that natively integrates with the Kafka Connect framework enabling seamless integration of MongoDB data within the Kafka ecosystem. The connector supports both source and sink modes.

MongoDB is a document database built on a horizontal scale-out architecture that uses a flexible schema for storing data. Founded in 2007, MongoDB has a worldwide following in the developer community. MongoDB Atlas is a cloud-based modern developer data platform that includes an integrated suite of services that increase developer productivity. By providing functionalities out of the box such as search, online archive, mobile sync, triggers and functions developers can spend more time adding business value to their code than wiring up disparate services. Just like MongoDB, AWS also has a managed Kafka product — Amazon Managed Streaming for Apache Kafka (Amazon MSK). Amazon MSK is a fully managed, highly available Apache Kafka service.

AWS and MongoDB collaboration

AWS and MongoDB were able to collaborate through the open source MongoDB Kafka Connector project to add the missing functionality to the connector. For the MongoDB Connector for Apache Kafka open source project, anyone in the community is allowed to contribute and create a PR. The MongoDB engineering team reviews these PRs to ensure the code conforms to internal standards. The team works together with the PR developer to address any open issues. After the review is complete, the code is merged from the PR into the main branch where it is available in the next release of the connector.

Now, we describe the process of how the collaboration unfolded. The first step was to create a JIRA issue to raise an issue or a feature request to the MongoDB engineering team. In our case, an AWS engineer raised a JIRA ticket to ask to support SSL Configuration in the connector properties. This was needed in order to make it easier to connect the connector with Amazon MSK securely. The plan was to add the following configuration parameters:

connection.ssl.trustStore=<your path to truststore>
connection.ssl.trustStorePassword=<your truststore password>
connection.ssl.keyStore=<your path to keystore>
connection.ssl.keyStorePassword=<your keystore password>

The next step was to code and test the changes. MongoDB Kafka Connector is a Java Gradle-based project. Because this is an open source project anybody can clone and build it locally. Since Amazon MSK Connect is based on open source Apache Kafka Connect and allows custom plug-in creation, we were able to deploy our newly built jar as a custom plugin into Amazon MSK Connect. The detailed steps are described in this blog. We updated the connector properties to include the new parameters for SSL certificates.

Once the functional tests succeeded, we headed to the GitHub repository to create our PR. We had a discussion with the upstream engineering team about the implementation via GitHub PR comments. The engineering team suggested an alternative implementation which resulted in the submission of a new and amended pull request with the recommended improvements. After the reviewers were in agreement with the new code, they merged the updated PR into the main branch. At this point the functionality became available for anybody to test, but it was not officially released yet. Customers who were interested in this feature could now build off the main branch and do initial testing.

Since the MongoDB Kafka connector follows a quarterly release cycle, our PR got included in the next release (version 1.10). At this point, our change became part of the official MongoDB Kafka Connector, listed in GitHub and got an official tag! Customers can now obtain the newly released connector from the official maven repo without having to build it themselves.

Contribute to the project

In conclusion, the overall experience of writing code, having it reviewed by MongoDB and checked into the main branch was straightforward. Because we used the Amazon MSK Connect service, there was no infrastructure to maintain, making it easy to test our changes on AWS. This new feature benefits both AWS Managed and self-managed Kafka, helping a wider circle of users and allowing us to improve other developers’ experiences.

If you are interested in contributing to the project submit your PR to https://github.com/mongodb/mongo-kafka/pulls
Try Amazon MSK
Try MongoDB Atlas on AWS

Igor Alekseev

Igor Alekseev

Igor Alekseev is a Senior Partner Solution Architect at AWS in Data and Analytics domain. In his role Igor is working with strategic partners helping them build complex, AWS-optimized architectures. Prior joining AWS, as a Data/Solution Architect he implemented many projects in Big Data domain, including several data lakes in Hadoop ecosystem. As a Data Engineer he was involved in applying AI/ML to fraud detection and office automation. Igor's projects were in variety of industries including communications, finance, public safety, manufacturing, and healthcare. Earlier, Igor worked as full stack software engineer/tech lead.

Ed Berezitsky

Ed Berezitsky

Ed Berezitsky is a Principal Streaming Architect at Amazon Web Services. Ed helps customers design and implement solutions using streaming technologies, and specializes on Amazon MSK and Apache Kafka.

Robert Walters

Robert Walters

Robert Walters is a Senior Product Manager at MongoDB. Previous to MongoDB, Rob spent 17 years at Microsoft working in various roles, including program management on the SQL Server team, consulting, and technical pre-sales. Rob has co-authored three patents for technologies used within SQL Server and was the lead author of several technical books on SQL Server. Rob is an active blogger on MongoDB Blogs.