Skip to main content

Amazon DocumentDB (with MongoDB compatibility) FAQs

General

Open all

Amazon DocumentDB is a serverless, fully managed, MongoDB API-compatible document database service. It removes the undifferentiated heavy lifting of database management tasks such as patching, backups, and monitoring. Amazon DocumentDB provides improved resilience and low latency with Global Clusters and leading security and compliance built to satisfy the requirements of high- sensitivity organizations such as global banks. It offers low total cost of ownership (TCO) with transparent pricing and no hidden costs. Its memory-optimized instances offer up to 43% cost savings compared to other popular document databases. I/O-Optimized provides improved price performance with up to 40% cost savings for I/O-intensive applications. Amazon DocumentDB is compatible with MongoDB APIs and drivers so you can migrate applications, typically without application code changes or downtime.

Document databases are one of the fastest growing categories of NoSQL databases, as they offer both flexible schemas and extensive query capabilities. The document model is a great choice for use cases with dynamic datasets that require ad-hoc querying, indexing, and aggregations. With the scale that Amazon DocumentDB provides, it is used by a wide variety of customers for use cases such as content management, personalization, catalogs, mobile and web applications, IoT, semantic search, and user profile management.

MongoDB compatibility” means that Amazon DocumentDB interacts with the Apache 2.0 open source MongoDB APIs. As a result, you can use the same MongoDB drivers, applications, and tools with Amazon DocumentDB with little or no changes. While Amazon DocumentDB supports a vast majority of the MongoDB APIs that customers use, it does not support every MongoDB API. Our focus is to deliver the capabilities that customers need.

We work backwards from customer needs and deliver capabilities, such as MongoDB API-compatibility, transactions, and sharding. To learn more about the supported MongoDB APIs, see our compatibility documentation . To learn about recent Amazon DocumentDB launches, see our What's New Feed.

No. Amazon DocumentDB does not utilize any MongoDB SSPL code and thus is not restricted by this license. Instead, Amazon DocumentDB interacts with the Apache 2.0 open-source MongoDB APIs. We continue to listen and work backward from our customers to deliver the capabilities that they need. To learn more about the supported MongoDB APIs, see the compatibility documentation.

Customers can use AWS Database Migration Service (DMS) to migrate their on-premises or Amazon Elastic Compute Cloud (EC2) MongoDB databases to Amazon DocumentDB with virtually no downtime. With DMS, you can migrate from a MongoDB replica set or from a sharded cluster to Amazon DocumentDB. Additionally, you can use most existing tools to migrate data from a MongoDB database to Amazon DocumentDB, including mongodump/mongorestore, mongoexport/mongoimport , and third-party tools that support Change Data Capture (CDC) via the oplog. For more information, see Migrating to Amazon DocumentDB .

No, Amazon DocumentDB works with a vast majority of MongoDB APIs, drivers, and tools compatible with MongoDB versions 3.6, 4.0, and 5.0.

Yes. With the launch of support for MongoDB 4.0 compatibility, Amazon DocumentDB supports the ability to perform atomicity, consistency, isolation, durability (ACID) transactions across multiple documents, statements, collections, and databases. To learn more, see our documentation Transctions in Amazon DocumentDB.

No, Amazon DocumentDB does not follow the same support lifecycles as MongoDB and MongoDB's EOL schedule does not apply to Amazon DocumentDB.

Amazon DocumentDB instances are deployed within a customer's Amazon Virtual Private Cloud (Amazon VPC) and can be accessed directly by Amazon Elastic Compute Cloud (Amazon EC2) instances or other AWS services that are deployed in the same VPC. Additionally, Amazon DocumentDB can be accessed by Amazon EC2 instances or other AWS services in different VPCs in the same region or other regions via VPC peering. Access to Amazon DocumentDB instances must be done through the mongo shell or with MongoDB drivers. Amazon DocumentDB requires that you authenticate when connecting to a cluster. For additional options, see Connecting to an Amazon DocumentDB instance from Outside an Amazon VPC .

For certain management features such as instance lifecycle management, encryption-at-rest with Amazon Key Management Service (KMS) keys and security groups management, Amazon DocumentDB leverages operational technology that is shared with Amazon Relational Database Service (RDS) and Amazon Neptune . When using the describe-db-instances and describe-db-clusters AWS CLI APIs, we recommend filtering for Amazon DocumentDB resources using the following parameter: "--filter Name=engine,Values=docdb".

Please see the Amazon DocumentDB pricing page for current information on available instance types per region.

To try Amazon DocumentDB, please see the Getting Started guide.

Yes, Amazon DocumentDB offers a Service Level Agreement of 99.99% uptime, which applies separately to each account using Amazon DocumentDB. For more information, please see Amazon DocumentDB (with MongoDB compatibility) Service Level Agreement.

The open source DocumentDB project, under the stewardship of the Linux Foundation, aims to provide the developer community with a PostgreSQL-based, 100% MongoDB API-compatible document database. In August 2025, AWS announced it is joining this project as a member of the technical steering committee.

While both open source DocumentDB and Amazon DocumentDB use DocumentDB in their name and are MongoDB API-compatible, the two software are different. While Amazon DocumentDB is built by AWS, open source DocumentDB is an extension of PostgreSQL. AWS will invest in both Amazon DocumentDB and open source DocumentDB akin to how we invest in Amazon OpenSearch Service and OpenSearch. We will contribute Amazon DocumentDB innovations to the open source project, and adopt features and capabilities from open source DocumentDB to our managed Amazon DocumentDB service.

Serverless

Open all

Amazon DocumentDB Serverless is an on-demand, auto scaling configuration for Amazon DocumentDB. It automatically scales capacity up or down in fine-grained increments based on your application's demand, offering up to 90% cost savings compared to provisioning for peak capacity. For applications with variable workloads, Amazon DocumentDB Serverless offers simplified resource management, with no upfront commitments or additional costs, so you only pay for the database capacity used. Amazon DocumentDB Serverless provides the same MongoDB compatible-APIs and capabilities as Amazon DocumentDB, including read replicas, Performance Insights, and I/O-Optimized storage.

With Amazon DocumentDB Serverless, you create a database, specify the desired range for database capacity, and connect your application. Amazon DocumentDB automatically adjusts the capacity within the range specified based on your application’s needs. You pay on a per-second basis for the database capacity you use when the database is active.

Amazon DocumentDB Serverless is available starting with Amazon DocumentDB 5.0 for both new and existing clusters.

Yes, you can switch between Serverless and choosing provisioned database resources at any time. Before switching between Serverless and provisioned resources, it is important to ensure your workload remains sufficiently performant. You can test the desired configuration by cloning your DocumentDB cluster and applying the desired configuration on the cloned cluster for testing before applying the same changes to your production environment. You can also easily fall back by switching to a previous configuration at any time.

Yes, you can set the capacity explicitly to a specific value using the AWS Management Console, the AWS CLI, or the Amazon DocumentDB API. 

Yes, you can start using Amazon DocumentDB Serverless to manage database compute capacity in your existing Amazon DocumentDB instance. A cluster containing both provisioned instances as well as Amazon DocumentDB Serverless is referred to as a mixed-configuration cluster. You can choose to have any combination of provisioned instances and Amazon DocumentDB Serverless in your cluster. 

Amazon DocumentDB Serverless supports the same MongoDB compatible-APIs and capabilities as Amazon DocumentDB, including Transactions, AWS Availability Zones, and Performance Insights. It does not support Elastic Clusters.

In Amazon DocumentDB Serverless, database capacity is measured in Amazon DocumentDB Capacity Units (DCUs). You pay a flat rate per second of DCU usage. Compute costs for running your workloads on Amazon DocumentDB Serverless will depend on the database cluster configuration that you choose: Amazon DocumentDB Standard or Amazon DocumentDB I/O-Optimized storage. For current information about pricing and Regional availability, visit the Amazon DocumentDB pricing page.

Performance and scaling

Open all

When writing to storage, Amazon DocumentDB only persists write-ahead logs, and does not need to write full buffer page syncs. As a result of this optimization, which does not compromise durability, Amazon DocumentDB writes are typically faster than traditional databases. Amazon DocumentDB instances can scale out to millions of reads per second with up to 15-read replicas.

Amazon DocumentDB scales in two dimensions: storage and compute. Amazon DocumentDB's storage automatically scales from 10 GB to 128 TiB in instance-based clusters, and up to 4 PiB for Amazon DocumentDB Elastic Clusters. Amazon DocumentDB's Compute can be scaled vertically by creating larger instances and horizontally (for greater read throughput) by adding additional replica instances to the cluster.

The minimum storage is 10 GiB. Based on your cluster usage, your Amazon DocumentDB storage will automatically grow, up to 128 TiB in 10 GiB increments with no impact on performance. With Amazon DocumentDB Elastic Clusters, storage will automatically grow up to 4 PiB in 10 GiB increments. For either case, there is no need to provision storage in advance.

Pricing

Open all

For current information about pricing and Region availability, please refer to the Amazon DocumentDB pricing page.

Yes, you can try Amazon DocumentDB for free using a one month free trial. Your organization gets up to 750 hours of t3.medium instance usage, 30 million IOs, 5 GB of storage, and 5 GB of backup storage. Once your one month free trial expires or your usage exceeds the free allowance, you can shut down your cluster to avoid any charges, or keep it running at our standard on-demand rates. To learn more, refer to the DocumentDB free trial page.

Amazon DocumentDB I/O-Optimized is the ideal choice when you need predictable costs or have I/O intensive applications. If you expect your I/O costs to exceed 25% of your total Amazon DocumentDB database costs, this option offers enhanced price performance. Refer to our Amazon DocumentDB I/O-Optimized documentation to learn more, including how to get started.

You can switch your existing database clusters once every 30 days to Amazon DocumentDB I/O-Optimized. You can switch back to Amazon DocumentDB standard storage configurations at any time.

Yes, the charges for the I/O operations required to replicate data across regions continue to apply. Amazon DocumentDB I/O-Optimized does not charge for read and write I/O operations, which is different from data replication. Refer to our Amazon DocumentDB I/O-Optimized documentation to learn more.

Elastic Clusters

Open all

Amazon DocumentDB Elastic Clusters enables you to elastically scale your document database to handle millions of writes and reads, with petabytes of storage capacity. Elastic Clusters simplifies how customers interact with Amazon DocumentDB by automatically managing the underlying infrastructure and removing the need to create, remove, upgrade, or scale instances.

You can create an Elastic Clusters cluster using the Amazon DocumentDB API, SDK, CLI, CloudFormation (CFN), or the AWS console. When provisioning your cluster, you specify how many shards and the compute per shard that your workload needs. Once you have created your cluster, you are ready to start leveraging Elastic Clusters’ elastic scalability. Now, you can connect to the Elastic Clusters cluster and read or write data from your application. Elastic Clusters is elastic. Depending on your workload’s needs, you can add or remove compute by modifying your shard count and/or compute per shard using the AWS console, API, CLI, or SDK. Elastic Clusters will automatically provision/de-provision the underlying infrastructure and rebalance your data.

Elastic Clusters uses sharding to partition data across Amazon DocumentDB’s distributed storage system. Sharding, also known as partitioning, splits large data sets into small data sets across multiple nodes enabling customers to scale out their database beyond vertical scaling limits of a single database. Elastic Clusters utilizes the separation of compute and storage in Amazon DocumentDB. Rather than re-partitioning collections by moving small chunks of data between compute nodes, Elastic Clusters can copy data efficiently within the distributed storage system.

Elastic Clusters supports hash-based partitioning.

With Elastic Clusters, you can easily scale out or scale in your workload on Amazon DocumentDB typically with little to no application downtime or impact to performance regardless of data size. A similar operation on MongoDB would impact application performance and take hours, and in some cases days. Elastic Clusters also offers differentiated management capabilities such as no impact backups and rapid point in time restore enabling customers to focus more time on their applications rather than managing their database.

No. You do not need to make any changes to your application to use Elastic Clusters.

No, in the near-term, you can use AWS Database Migration Service (AWS DMS) to migrate data from an existing Amazon DocumentDB instance cluster to an Elastic Clusters cluster.

Choosing an optimal shard key for Elastic Clusters is no different than other databases. A great shard key has two characteristics - high frequency and high cardinality. For example, if your application stores user_orders in DocumentDB, then generally you have to retrieve the data by the user. Therefore, you want all orders related to a given user to be in one shard. In this case, user_id would be a good shard key. Read more information

Elastic Clusters integrates with other AWS services in the same way DocumentDB does today. First, you can use AWS Database Migration Service (DMS) to migrate from MongoDB and other relational databases to Elastic Clusters. Second, you can monitor the health and performance of your Elastic Clusters cluster using Amazon CloudWatch. Third, you can set up authentication and authorization through AWS IAM users and roles and use AWS VPC for secure VPC-only connections. Last, you can use AWS Glue to import and export data from/to other AWS services such as S3, Redshift and OpenSearch.

Yes. You can migrate your existing MongoDB sharded workloads to Elastic Clusters. You can either use the AWS Database Migration Service or native MongoDB tools, such as mongodump and mongorestore, to migrate your MongoDB workload to Elastic Clusters. Elastic Clusters also supports MongoDB’s commonly used APIs, such as shardCollection(), giving you the flexibility to reuse existing tooling and scripts with Amazon DocumentDB.

Backup and restore

Open all

Automated backups are always enabled on Amazon DocumentDB clusters. Amazon DocumentDB enables point-in-time recovery for your clusters. You can increase your backup window for point-in-time restores up to 35 days. Backups do not impact database performance. To learn more, see Backing up and restoring in Amazon DocumentDB.

Yes. Manual snapshots can be retained beyond the backup window and there is no performance impact when taking snapshots. Note that restoring data from cluster snapshots requires creating a new cluster.

Amazon DocumentDB automatically makes your data durable across three Availability Zones (AZs) within a Region and will automatically attempt to recover your instance in a healthy AZ with no data loss. In the unlikely event your data is unavailable within Amazon DocumentDB storage, you can restore from a cluster snapshot or perform a point-in-time restore operation to a new cluster. Note that the latest restorable time for a point-in-time restore operation can be up to five minutes in the past.

You can choose to create a final snapshot when deleting your instance. If you do, you can use this snapshot to restore the deleted instance at a later date. Amazon DocumentDB retains this final user- created snapshot along with all other manually created snapshots after the instance is deleted. Only snapshots are retained after the instance is deleted (i.e., automated backups created for point-in-time restore are not kept).

You can choose to create a final snapshot when deleting your instance. If you do, you can use this snapshot to restore the deleted instance at a later date. Amazon DocumentDB retains this final user- created snapshot along with all other manually created snapshots after the instance is deleted. Only snapshots are retained after the instance is deleted (i.e., automated backups created for point-in-time restore are not kept).

Yes. Amazon DocumentDB gives you the ability to create snapshots of your cluster, which you can use later to restore a cluster. You can share a snapshot with a different AWS account, and the owner of the recipient account can use your snapshot to restore a cluster that contains your data. You can even choose to make your snapshots public – that is, anybody can restore a cluster containing your (public) data. You can use this feature to share data between your various environments (production, dev/test, staging, etc.) that have different AWS accounts, as well as keep backups of all your data secure in a separate account in case your main AWS account is ever compromised.

There is no charge for sharing snapshots between accounts. However, you may be charged for the snapshots themselves, as well as any clusters that you restore from shared snapshots.

We do not support sharing automatic cluster snapshots. To share an automatic snapshot, you must manually create a copy of the snapshot, and then share the copy.

No. Your shared Amazon DocumentDB snapshots will only be accessible by accounts in the same region as the account that shares them.

Yes. You can share encrypted Amazon DocumentDB snapshots. The recipient of the shared snapshot must have access to the KMS key that was used to encrypt the snapshot.

No. Amazon DocumentDB snapshots can only be used inside of the service.

You can choose to create a final snapshot when deleting your cluster. If you do, you can use this snapshot to restore the deleted cluster at a later date. Amazon DocumentDB retains this final user-created snapshot along with all other manually created snapshots after the cluster is deleted.

Resiliency

Open all

Amazon DocumentDB automatically divides your storage volume into 10 GiB segments spread across many disks. We make your data durable across three Availability Zones (AZs) and you only pay for one copy. Amazon DocumentDB is designed to transparently handle the loss of up to two copies of data without affecting write availability and up to three copies without affecting read availability. Amazon DocumentDB storage volume is also self-healing. Data blocks and disks are continuously scanned for errors and repaired automatically.

Unlike other databases, after a database crash, Amazon DocumentDB does not need to replay the redo log from the last database checkpoint (typically five minutes) and confirm that all changes have been applied before making the database available for operations. This reduces database restart times to less than 60 seconds in most cases. Amazon DocumentDB moves the cache out of the database process and makes it available immediately at restart time. This prevents you from having to throttle access until the cache is repopulated to avoid brownouts.

Amazon DocumentDB supports read replicas, which share the same underlying storage volume as the primary instance. Updates made by the primary instance are visible to all Amazon DocumentDB replicas. You can configure up to 15 read replicas. Replication is asynchronous and typically completes in milliseconds, with low impact on the performance of the primary instance. To learn more, see Amazon DocumentDB High availability and replication.

Yes, you can replicate your data across Regions using the Global Clusters feature. Global Clusters span across multiple AWS Regions. Global Clusters replicate your data to clusters in up to five Regions with little to no impact on performance. Global Clusters provide faster recovery from Region-wide outages and enable low-latency global reads. To learn more see the Global Clusters feature page and blog post.

Yes. You can assign a promotion priority tier to each instance on your cluster. If the primary instance fails, Amazon DocumentDB will promote the replica with the highest priority to primary. If there are inconsistencies between two or more replicas in the same priority tier, then Amazon DocumentDB will promote the replica that is the same size as the primary instance.

You can modify the priority tier for an instance at any time. Simply modifying priority tiers will not trigger a failover.

You can assign lower priority tiers to replicas that you do not want promoted to the primary instance. However, if the higher priority replicas on the cluster are unhealthy or unavailable for some reason, then Amazon DocumentDB will promote the lower priority replica.

Amazon DocumentDB can be deployed in a high-availability configuration by using replica instances in multiple AWS Availability Zones as failover targets. In the event of a primary instance failure, a replica instance is automatically promoted to be the new primary with minimal service interruption.

You can add additional Amazon DocumentDB replicas. Amazon DocumentDB replicas share the same underlying storage as the primary instance. Any Amazon DocumentDB replica can be promoted to become primary without any data loss and therefore can be used for enhancing fault tolerance in the event of a primary instance failure. To increase cluster availability, simply create one to 15 replicas, in multiple AZs, and Amazon DocumentDB will automatically include them in failover primary selection in the event of an instance outage.

Failover is automatically handled by Amazon DocumentDB so that your applications can resume database operations as quickly as possible without manual administrative intervention.

  • If you have an Amazon DocumentDB replica instance in the same or a different Availability Zone, when failing over, Amazon DocumentDB flips the canonical name record (CNAME) for your instance to point at the healthy replica, which is in turn promoted to become the new primary. Start-to-finish, failover typically completes within 30 seconds. 
  • If you do not have an Amazon DocumentDB replica instance (i.e. a single instance cluster), Amazon DocumentDB will attempt to create a new instance in the same Availability Zone as the original instance. This replacement of the original instance is done on a best-effort basis and may not succeed, for example, if there is an issue that is broadly affecting the Availability Zone. 

Your application should retry database connections in the event of connection loss.

Amazon DocumentDB will automatically detect a problem with your primary instance and begin routing your read/write traffic to an Amazon DocumentDB replica instance. On average, this failover will complete within 30 seconds. In addition, the read traffic that your Amazon DocumentDB replicas instances were serving will be briefly interrupted.

Since Amazon DocumentDB replicas share the same data volume as the primary instance, there is virtually no replication lag. We typically observe lag times in the 10s of milliseconds.

Security and compliance

Open all

Yes. All Amazon DocumentDB instances must be created in an Amazon VPC. With Amazon VPC, you can define a virtual network topology that closely resembles a traditional network in your own datacenter. This gives you complete control over who can access your Amazon DocumentDB instances.

Amazon DocumentDB supports RBAC with built-in roles. RBAC enables you to enforce least privilege as a best practice by restricting the actions that users are authorized to perform. For more information, see Amazon DocumentDB role-based access control.

Amazon DocumentDB utilizes Amazon VPC to enforce strict network and authorization boundaries. Authentication and authorization for Amazon DocumentDB management APIs is provided by IAM users, roles, and policies. Authentication to an Amazon DocumentDB database is done via standard MongoDB tools and drivers with Salted Challenge Response Authentication Mechanism (SCRAM), the default authentication mechanism for MongoDB.

Yes. Amazon DocumentDB allows you to encrypt your clusters using keys you manage through AWS Key Management Service (KMS). On a cluster running with Amazon DocumentDB encryption, data stored at rest in the underlying storage is encrypted, as are its automated backups, snapshots, and replicas in the same cluster. Encryption and decryption are handled seamlessly. For more information about the use of KMS with Amazon DocumentDB, see the Encrypting Amazon DocumentDB Data at Rest.

Currently, encrypting an existing unencrypted Amazon DocumentDB instance is not supported. To use Amazon DocumentDB encryption for an existing unencrypted cluster, create a new cluster with encryption enabled and migrate your data into it.

Amazon DocumentDB was designed to meet the highest security standards and to make it easy for you to verify our security and meet your own regulatory and compliance obligations. It has been assessed to comply with PCI DSS, ISO 9001, 27001, 27017, and 27018, SOC 1, 2 and 3, and Health Information Trust Alliance (HITRUST) Common Security Framework (CSF) certification, in addition to being HIPAA eligible. AWS compliance reports are available for download in AWS Artifact.

Major version upgrade

Open all

In-place major version upgrade (MVU) lets you upgrade Amazon DocumentDB 3.6 or 4.0 clusters to Amazon DocumentDB 5.0 using the AWS Console, Software Development Kit (SDK), or Command Line Interface (CLI). With in-place MVU, there is no need to create new clusters or change your end points. In-place MVU is available starting with Amazon DocumentDB version 5.0. To get started with in-place MVU, please review in-place MVU documentation.

In-place MVU lets you seamlessly upgrade your Amazon DocumentDB 3.6 or 4.0 clusters to version 5.0 without the need to perform backup and restore to another cluster and without using other data migration tools. In doing so, it reduces the time and effort associated with usual upgrade process which entail configuring the source and target end points, migrating indexes and data, changing application code, and more.

You won't need to change your endpoint in your applications post upgrade. Since the data stays in the same cluster, there is no additional cost to upgrade using feature.

Downtime can vary from cluster to cluster depending on number of collections, indexes, databases, and instances. Before running in-place major version upgrade on your production cluster, we strongly recommend running it in a lower environment to test downtime, performance, and also verify that your applications work as expected post upgrade.

You can also utilize the fast clone feature to clone your cluster data for testing. Depending on the complexity of your Amazon DocumentDB implementation, you can engage our database solutions architect for additional help.

In-place MVU is only supported with Amazon DocumentDB 3.6 or 4.0 as a source and version 5.0 as target. It is not supported for Amazon DocumentDB Global Clusters or Elastic Clusters or with DocumentDB 4.0 as target.

Generative AI

Open all

Vector search is a method used in machine learning (ML) to find similar data points to a given data point by comparing their vector representations using distance or similarity metrics. The closer the two vectors are in the vector space, the more similar the underlying items are considered to be. This technique helps capture the meaning or semantics of the data. This approach is useful in various applications, such as recommendation systems, natural language processing, and image recognition.

Vector search for Amazon DocumentDB combines the flexibility and rich querying capability of a JSON-based document database with the power of vector search. You can use your existing Amazon DocumentDB data, or a flexible document data structure, to build machine learning and generative AI use cases such as semantic search experiences, product recommendations, personalization, chatbots, fraud detection, and anomaly detection. Visit the vector search for Amazon DocumentDB documentation to learn more.

Vector search for Amazon DocumentDB is available on Amazon DocumentDB 5.0 instance-based clusters.

Vector search for Amazon DocumentDB enables the use of semantic search so you can capture the meaning, context, and intent behind your data. Keyword search finds the document based on the actual text or pre-defined synonym mappings. For example, in a traditional e-commerce application, a red dress might return products that have the words “red” and “dress” in their descriptions. Semantic search will retrieve results with dresses in different shades of red which can improve the user experience.  

There is no additional cost to use vector search for Amazon DocumentDB. Standard compute, I/O, storage, and backup charges will apply as you store, index, and search vectors in Amazon DocumentDB. Visit the Amazon DocumentDB pricing page to learn more.

Amazon DocumentDB integrates with Amazon SageMaker Canvas, making it easy to build machine learning (ML) models and customize foundation models using data stored in Amazon DocumentDB without writing a single line of code. You no longer need to develop custom data and ML pipelines between Amazon DocumentDB and SageMaker Canvas. You can launch SageMaker Canvas from within the Amazon DocumentDB console and add existing Amazon DocumentDB databases as a data source to start building your machine learning models. You can use your data in DocumentDB in SageMaker Canvas to build models to predict customer churn, detect fraud, predict maintenance failures, forecast financial metrics and sales, optimize inventory, summarize content, and generate content.

The integration of Amazon DocumentDB with Amazon SageMaker Canvas makes it easy to build generative artificial intelligence (AI) and machine learning (ML) applications using data stored in Amazon DocumentDB. You no longer need to develop custom data and ML pipelines between Amazon DocumentDB and SageMaker Canvas. The in-console integration removes the undifferentiated heavy lifting to connect and access data to accelerate ML development with a low code no code (LCNC) experience. You can launch SageMaker Canvas from within the Amazon DocumentDB console and add existing Amazon DocumentDB databases as a data source.

Amazon SageMaker Canvas offers a no-code interface to build machine learning models using data from various data sources including Amazon DocumentDB. You are charged for your use of SageMaker Canvas and for the resulting I/Os when SageMaker Canvas reads data from your Amazon DocumentDB instance. There is no additional charge to use DocumentDB as a data source in Amazon SageMaker Canvas. Visit the Amazon DocumentDB pricing page and SageMaker Canvas pricing page to learn more.

Zero-ETL integration

Open all

The zero-ETL integration of Amazon DocumentDB with Amazon OpenSearch Service abstracts away the operational complexity in extracting, transforming, loading (ETL) of data from an Amazon DocumentDB collection to Amazon OpenSearch managed cluster or serverless collection. With this integration, you no longer have to build or manage data pipelines nor transform data.

If you want to use MongoDB APIs, you should use the native database capabilities in Amazon DocumentDB to perform vector search on your documents. The Amazon DocumentDB zero-ETL integration with Amazon OpenSearch Service is well suited for searching across collections and for storing and indexing vectors with more than 2,000 dimensions.

The zero-ETL integration of Amazon DocumentDB with Amazon OpenSearch Service uses Amazon OpenSearch Ingestion to seamlessly move operational data from Amazon DocumentDB to Amazon OpenSearch Service. To get started, you enable change stream functionality on the Amazon DocumentDB collection that needs to be replicated. The zero-ETL integration feature sets up an Amazon OpenSearch Ingestion pipeline in your account that automatically replicates the data to an Amazon OpenSearch Service managed cluster or serverless collection.

Amazon OpenSearch Ingestion automatically understands the format of the data in Amazon DocumentDB collections and maps the data to Amazon OpenSearch Service to yield the most performant search results. You can synchronize data from multiple Amazon DocumentDB collections via multiple pipelines into one Amazon OpenSearch managed cluster or serverless collection to offer holistic insights across several applications. Optionally, you can specify custom data processors when defining the ingestion configuration in Amazon OpenSearch Service. Subsequent updates to Amazon DocumentDB collections are also replicated to Amazon OpenSearch Service without any manual intervention.

This zero-ETL leverages the native data transformational capabilities of Amazon OpenSearch Ingestion pipelines to aggregate and filter the data while it is in motion.

You can also write custom transformation logic if you want bespoke transformational capability, and Amazon OpenSearch Ingestion will manage the transformation process. Alternatively, if want to move entire data from source to sink without customization, Amazon OpenSearch Ingestion provides out-of-the box blueprints so that you can perform the integrations with just a few button clicks.

In order to ensure that Amazon OpenSearch Ingestion has the necessary permissions to replicate data from Amazon DocumentDB, the zero-ETL integration feature creates an IAM role with the necessary permissions to read data from Amazon DocumentDB collection and write to an Amazon OpenSearch domain or collection. This role is then assumed by Amazon OpenSearch Ingestion pipelines to ensure that the right security posture is always maintained when moving the data from source to destination.

You can view all the metrics related to your zero-ETL integration with Amazon DocumentDB on the console dashboards provided by Amazon DocumentDB and OpenSearch Ingestion pipeline. You can also query real-time logs in Amazon CloudWatch and set up custom alerting using Amazon CloudWatch that are triggered when user-defined thresholds are breached.