AWS Database Blog

Analyze Amazon DocumentDB workloads with Performance Insights

Amazon DocumentDB (with MongoDB compatibility) is a fast, reliable, and fully managed database service. Amazon DocumentDB makes it easy to set up, operate, and scale MongoDB API-compatible databases in the cloud. With Amazon DocumentDB, you can run the same application code and use the same drivers and tools that you use with MongoDB API.

Performance Insights adds to the existing Amazon DocumentDB monitoring features to illustrate your cluster performance and help you analyze any issues that affect it. With the Performance Insights dashboard, you can visualize the database load and filter the load by waits, query statements, hosts, or application. Performance Insights is included with Amazon DocumentDB instances and stores seven days of performance history in a rolling window at no additional cost.

Solution overview

In this post, we enable Performance Insights on the instances of an existing Amazon DocumentDB cluster. We load sample data into the cluster and run a sample workload to generate load on the database. Then we use Performance Insights to visualize the load on the Amazon DocumentDB instance and identify the queries, hosts, databases, and applications causing the load. Lastly, we create an index to improve the query performance and use Performance Insights to visualize the reduced load on the instance.

Prerequisites

To implement this solution, you must have the following prerequisites:

  • An AWS Cloud9 environment where you can load sample data to your Amazon DocumentDB cluster and run Python scripts to generate database load. You can use an existing AWS Cloud9 environment or create a new one.
  • A security group that enables you to connect to your Amazon DocumentDB cluster from your AWS Cloud9 environment. You can use an existing security group or create a new one.
  • An Amazon DocumentDB cluster with at least one t3.medium instance. You can use an existing Amazon DocumentDB cluster or create a new one. This post assumes the default values for port (27017) and TLS (enabled) settings.
  • A mongo shell (or similar tool) to perform administrative actions on the database.

Enable Performance Insights

Performance Insights is not enabled by default. You can enable Performance Insights using the AWS Management Console or AWS Command Line Interface (AWS CLI) when creating a new cluster or by modifying instances in an existing cluster. You can disable it later if necessary. Enabling and disabling Performance Insights doesn’t cause downtime, a reboot, or a failover.

To enable Performance Insights through the console, complete the following steps for each instance in your existing cluster:

  • On the Amazon DocumentDB console, choose Clusters in the navigation pane.
  • Select the AWS DocumentDB instance from the cluster on the Actions menu, choose Modify.

  • Select Enable Performance Insights in the Performance Insights section.
  • Choose Continue and complete the configuration changes.

You see the summary of changes and the option to select when you want these modifications to take place.

  • Select your preference of when to apply modifications (for this post, we choose Apply immediately).
  • Choose Modify instance.

  • Repeat these steps to enable Performance Insights on other instances in your cluster.

Now we’re ready to configure the AWS Cloud9 environment, load sample data, and create a database load on Amazon DocumentDB.

Configure the AWS Cloud9 environment

To configure your AWS Cloud9 environment, complete the following steps:

  • In the AWS Cloud9 IDE, on the Window menu, choose New Terminal.

  • Run the following commands to install the mongo shell:
    sudo pip install pymongo
    echo -e "[mongodb-org-4.0] \nname=MongoDB Repository\nbaseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/4.0/x86_64/\ngpgcheck=1 \nenabled=1 \ngpgkey=https://www.mongodb.org/static/pgp/server-4.0.asc" | sudo tee /etc/yum.repos.d/mongodb-org-4.0.repo
    sudo yum install -y mongodb-org-shell
    sudo yum install -y mongodb-org-tools
  • Download the .pem file on the AWS Cloud9 IDE for a TLS-enabled connection with Amazon DocumentDB:
    wget -c https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem
  • Set the environment variables:
    echo "export docdbEndpoint=<Amazon DocumentDB cluster endpoint>" >> /home/ec2-user/.bashrc
    echo "export docdbUser=<Amazon DocumentDB cluster master user name>" >> /home/ec2-user/.bashrc
    echo "export docdbPass=<Amazon DocumentDB cluster master password>" >> /home/ec2-user/.bashrc
    source /home/ec2-user/.bashrc

For more information on finding your Amazon DocumentDB cluster endpoint, see Finding a Cluster’s Endpoints.

Load sample data into Amazon DocumentDB

Run the following commands to download the sample data into your AWS Cloud9 IDE and import this sample data into an Amazon DocumentDB collection named customers in a database named pi:

wget https://github.com/aws-samples/amazon-documentdb-samples/raw/master/datasets/pi-data-part1.json.zip

wget https://github.com/aws-samples/amazon-documentdb-samples/raw/master/datasets/pi-data-part2.json.zip

unzip pi-data-part1.json.zip
unzip pi-data-part2.json.zip

mongoimport --db pi --collection customers -h $docdbEndpoint:27017 -u $docdbUser -p $docdbPass --ssl --sslCAFile global-bundle.pem --file pi-data-part1.json
mongoimport --db pi --collection customers -h $docdbEndpoint:27017 -u $docdbUser -p $docdbPass --ssl --sslCAFile global-bundle.pem --file pi-data-part2.json

Use the following commands to connect to the Amazon DocumentDB cluster using the mongo shell, validate the count of records data loaded in the customers collection, and create necessary indexes on the customers collection:

mongo --ssl --host $docdbEndpoint:27017 --sslCAFile global-bundle.pem --username $docdbUser --password $docdbPass

use pi
db.customers.count()
db.getCollection("customers").createIndex({"PetData.Pet":1})
db.getCollection("customers").createIndex({"State":1})

After the indexes have been created exit the mongo shell:

exit

Generate the database load with Python scripts

To generate the database load, complete the following steps:

  • Use the following commands to download three Python scripts in your AWS Cloud9 IDE:
    wget https://github.com/aws-samples/amazon-documentdb-samples/raw/master/blogs/performanceinsights-docdb/carMakesByPet.py
    wget https://github.com/aws-samples/amazon-documentdb-samples/raw/master/blogs/performanceinsights-docdb/petsByState.py
    wget https://github.com/aws-samples/amazon-documentdb-samples/raw/master/blogs/performanceinsights-docdb/statesByCarMake.py
  • Choose the plus sign next to the existing Terminal tab and choose New Terminal.
  • Open two more Terminal tabs so you have four Terminal tabs.

  • On the second Terminal tab, run the carMakesByPet.py Python script:
    python carMakesByPet.py
  • On the third Terminal tab, run the statesByCarMake.py Python script:
    python statesByCarMake.py
  • On the fourth Terminal tab, run the petsByState.py Python script:
    python petsByState.py
  • After you start the scripts, you can navigate to the Performance Insights dashboard to monitor the performance of the database. Choose the primary instance of your Amazon DocumentDB cluster from the Select Database Instance dropdown.

  • Choose 5m for the View past setting.

Monitor the database load using Performance Insights

Performance Insights utilizes the concept of database load (DB load), which is measured in average active sessions (AAS). The Database load chart shows how the database activity compares to instance capacity as represented by the Max vCPUs line. By default, the stacked line chart represents DB load as average active sessions per unit of time. The DB load is sliced (grouped) by wait states.

You can also slice the database load by query. In the following graph, the different actively running queries are represented by the different color codes. We can observe that one particular query is contributing the most to the database load.

You can also view the load as active sessions grouped by any of the supported dimensions:

  • Top waits – Events for which the database backend is waiting. See Dimensions for descriptions of the various DocumentDB wait states. The following screenshot of the Top waits tab also shows the same query (green) contributing the most to the CPU wait state.

  • Top queries – This tab lists the queries causing load on the database.

When you select the top query, this scopes the Database load chart to load caused by this query. The following screenshot shows the details of the top query digest that are causing load, which in this example is {"aggregate":"customers","pipeline":[{"$match":{"CarData.CarMaker":"?"}}....

To see the literals of the child queries, choose the plus sign to expand the query.

  • Top hosts – This tab shows the IP address and port of top clients causing load on the database. Because we triggered the workload from just one AWS Cloud9 environment, all the IP addresses in the following screenshot are the same.

  • Top databases – This tab shows the top databases causing load on the database. In this example, we are running the workload only in the pi database in the cluster.

  • Top applications – This shows the top applications causing load on the database. We have supplied the application name in the connection string parameter appName in the Python scripts while making the connection to the Amazon DocumentDB cluster. The following screenshot shows that all the load generated by this query is from the statesByCarMake-py application.

Address the cause of database load

With the help of Performance Insights, you can identify the query causing the most load on the database. Earlier, you created indexes for aggregations matching on the state and Pet.PetType fields. To reduce the load caused by the aggregation matching on the CarData.CarMaker field, create an index on the CarData.CarMaker field. On the first AWS Cloud9 Terminal tab, create the index:

mongo --ssl --host $docdbEndpoint:27017 --sslCAFile global-bundle.pem --username $docdbUser --password $docdbPass

use pi 
db.getCollection("customers").createIndex({"CarData.CarMaker":1}) 
exit

Return to Performance Insights and slice the Database load chart by query. You will see the createIndex() statement causing load while the new index is built.

After the index has finished building, you will see the load generated by the {"aggregate":"customers","pipeline":[{"$match":{"CarData.CarMaker":"?"}},{"$group... query begin to drop.

Clean up

To avoid ongoing costs, complete the following steps to clean up your resources:

  • If the Python scripts are still running, quit them by pressing Ctrl+C in each of the Terminal tabs.
  • Delete the pi database from the Amazon DocumentDB cluster using the mongo shell:
    mongo --ssl --host $docdbEndpoint:27017 --sslCAFile global-bundle.pem --username $docdbUser --password $docdbPass
    
    use pi
    db.dropDatabase()
  •  If you used an existing Amazon DocumentDB cluster, disable Performance Insights on the cluster.
  •  If you created a new Amazon DocumentDB cluster, delete the cluster.
  • If you created an AWS Cloud9 IDE, delete the environment.
  • If you created a new security group, delete the security group.

Conclusion

In this post, we used Performance Insights in Amazon DocumentDB to identify load on the database caused by a sample workload, applied a performance improvement, and verified the performance improvement. Using Performance Insights, we saved lot of time in investigating the bottlenecks in the database. Performance Insights helped visualize the load to determine when, where and what caused the load in database. It brought the investigating time from hours to minutes!

Leave a comment. We’d love to hear your thoughts and suggestions.


About the Authors

Neha Daudani is a Senior Solutions Architect at AWS. She helps customers design applications using modern techniques and best practices. She has extensive experience in the data and analytics space in designing enterprise data warehouses and reporting platforms

Douglas Bonser is a Senior DocumentDB Specialist Solutions Architect based out of the Dallas/Ft. Worth area in Texas. He enjoys helping customers solve problems and modernize their applications using NoSQL database technology. Prior to joining AWS he has worked extensively with NoSQL and relational database technologies for over 30 years.