Posted On: Feb 6, 2023

AWS Glue Crawlers support MongoDB to extract the data schema and automatically populate the AWS Glue Data Catalog, which keeps the metadata current. Today we are expanding the support to include MongoDB Atlas. This feature makes it much simpler to bring the managed MongoDB Atlas metadata into the AWS Glue Data Catalog, so that data engineers can integrate MongoDB Atlas data with S3 based data lakes and extract meaningful insights. 

With today’s launch, you can create and schedule a Glue Crawler to crawl MongoDB Atlas. In the Glue Crawler console, you can select MongoDB as a datasource. You can then create a Glue connection with the connection type “DocumentDB/MongoDB” and provide the MongoDB Atlas cluster information and credentials. Once the configuration is created, you can specify the MongoDB Atlas database and collections to crawl. With each run of the crawler, the crawler inspects specified collections and catalogs information. This includes updates or deletes to MongoDB Atlas collections, views, and materialized views in the AWS Glue Data Catalog. With AWS Glue, you can now use AWS Glue Data Catalog as a source to pull data from MongoDB Atlas and populate an Amazon S3 target.

AWS Glue Crawler support for MongoDB Atlas is generally available in all commercial regions where AWS Glue is available. Read the blog, and visit the AWS Glue Crawler documentation to learn more.