AWS Database Blog
10 Amazon Aurora tips for application developers – Part 2
This is the second post of the two-part series on how application developers can take advantage of Amazon Aurora features, capabilities, and complementary services. In Part 1, we shared the first five tips:
- Split the application SQL read/write load to enhance your app’s performance.
- For Java apps, use AWS JDBC drivers for faster app recovery from Aurora failover.
- Use cloning to test applications against production data.
- Use Amazon Aurora Global Database for disaster recovery and cross-Region high performance reads.
- Get consistent application performance at lower cost with Amazon Aurora Serverless v2.
In this post, we share the next five tips, which show you how to enhance performance, security, and availability of your apps by taking advantage of AWS services that directly integrate with or complement Aurora databases.
Prerequisites
Make sure to review the prerequisites in Part 1. Although the examples in this post are shown for Amazon Aurora PostgreSQL-Compatible Edition, they may be ported for use with Amazon Aurora MySQL-Compatible Edition.
Tip #6: Integrate with other AWS services using AWS Lambda
You can invoke an AWS Lambda function from Amazon Aurora MySQL-Compatible Edition and Amazon Aurora PostgreSQL-Compatible Edition. This capability is exposed to database objects such as stored procedures and triggers. Amazon Aurora MySQL-Compatible Edition exposes native functions (lambda_sync, lambda_async
) to run Lambda functions.
Amazon Aurora PostgreSQL-Compatible Edition requires the aws_lambda extension to be installed on the cluster. You can then use the functions on the extension for invoking the Lambda function. You can set up the code in the Lambda function to integrate with any AWS service.
Lambda functions can be invoked synchronously or asynchronously from the SQL code. Synchronous calls block the SQL code from running until the Lambda function returns, which may have an impact on the application’s performance. If you decide to use synchronous calls to Lambda functions, thoroughly test your application for performance and timeouts. Asynchronous calls to Lambda functions are non-blocking, but the SQL code doesn’t receive the results from the function call. Synchronous calls are referred to as RequestResponse
invocations, and asynchronous calls are referred to as Event invocations
.
This feature can come handy when applications need to publish events on data changes. For example, consider a scenario in which when an insert is carried out in the orders table, and an email is sent to the customer confirming the order via Amazon Simple Email Service (Amazon SES). To achieve this, a trigger on insert on the orders table can invoke the Lambda function to send out an email.
If you decide to consider the Lambda invocation capability for your application, thoroughly test your application performance to ensure that the Lambda function invocation overhead (latency) is acceptable for your workload.
The following diagram illustrates this architecture.
Although the Lambda function may be outside of the database VPC, we recommend creating the Lambda function in the same VPC as your database instances. This simplifies the setup and provides better performance due to lower overheads.
To invoke a Lambda function from Aurora, complete the following steps:
- Ensure that the DB instances are created in a subnet with a NAT gateway, otherwise a VPC interface endpoint is needed for Lambda.
- Create a Lambda function in the same Region as the cluster. You may use the boilerplate Hello from Lambda sample.
- Set up an AWS Identity and Access Management (IAM) role to allow the DB instance to invoke the Lambda function:
- To create an IAM policy, run the following command from the bastion host shell:
- To create the IAM role, run the following command from the bastion host shell:
- Attach the IAM policy to the role:
- Add the IAM role to the DB cluster for Lambda invocation (adjust the DB cluster identifier accordingly):
- Use the code for testing on PostgreSQL:
- Install the
aws_lambda
extension: - Run the
aws_lambda.invoke(..)
function to invoke the Lambda function:
- Install the
- When you’re done experimenting, clean up the environment:
- Delete the Lambda function.
- Remove the role from the database:
- Detach the policy from the role:
- Delete the role:
- Delete the policy:
Tip #7: Run machine learning on transactional data and run predictions
Amazon SageMaker enables you to build, train, and deploy custom machine learning (ML) models quickly. Amazon Comprehend is a natural language processing (NLP) service that uses ML models to find insights in text. You can use Amazon Comprehend to determine the sentiment of content in UTF-8 encoded text documents.
Aurora integrates directly with SageMaker ML and Amazon Comprehend for sentiment analysis. Amazon Aurora MySQL-Compatible Edition exposes native functions for using this capability from SQL code in stored procedures and triggers. Amazon Aurora PostgreSQL-Compatible Edition requires the aws_ml extension to be installed on the cluster.
Consider a scenario in which you want to run sentiment analysis on some data in the transactional database. For example, you can use sentiment analysis to determine the sentiments of comments on a blog post to determine if your readers liked the post. One way to do it would be to export the data out of the database and then run ML training on the data to create an ML model for predictions. You can then use the model from the application code to do the sentiment analysis. With the Aurora integration with Amazon Comprehend, you can use a pre-trained model to do the sentiment analysis on the data in the database. The integration saves you on the ML training and doesn’t require any ML skills.
The following diagram illustrates this architecture.
Let’s try out sentiment analysis on data in a table. Complete the following steps:
- Ensure that DB instances are created in a subnet with a NAT gateway, otherwise a VPC interface endpoint is needed for Amazon Comprehend.
- Set up an IAM role to allow the DB instance to invoke the Lambda function:
- Create a policy to invoke Amazon Comprehend features:
- Create a role that is assumed by Aurora:
- Attach the policy to the role:
- Add the IAM role to the DB cluster for Lambda invocation (adjust the DB cluster identifier accordingly):
- Use the code for testing on PostgreSQL:
- Create a test table using psql:
- Populate the test table with some comments data:
- Invoke sentiment analysis on the data in the table; pay attention to the confidence level:
- When you’re done experimenting, clean up the environment:
- Remove the role from the database:
- Detach the policy from the role:
- Delete the role:
- Delete the policy:
Tip #8: Use Amazon ElastiCache for better performance
With relational databases, there comes a point when further improvement of performance may be costly or not even possible. Amazon ElastiCache is a managed caching service that supports Memcached and Redis. You can use ElastiCache as a caching layer for relational as well as NoSQL databases to achieve lower latency for data reads. An added benefit of using ElastiCache in front of Aurora is that it may also reduce the cost of I/O associated with the database.
The following diagram illustrates this architecture.
In general, consider using ElastiCache for workloads that need low latency for reads, have complex joins or calculations that are resource intensive, and may be materialized in a cache layer. With ElastiCache, you have two choices for your cache layer: Memcached is an in-memory only cache, and Redis also offers durability. The implementation of the caching layer involves first checking the cache for requested data and serving it from the cache if it’s available; otherwise, you read from the database and add to the cache for faster subsequent requests for the same data.
Let’s try out ElastiCache for Redis with Amazon Aurora PostgreSQL-Compatible Edition. In this test, you use Redis’s in-memory key-value storage feature. You may add key-value data to the Redis cluster for faster reads. Each of the key-value pairs has a time to live (TTL) attribute that specifies the number of seconds for which the key-value pair is available for reads.
In this test, we create a table and populate it with 10 million rows. We run SELECT on the table to get a count of rows with numbers that are divisible by 3. The count is cached in the Redis cluster with a TTL of 30 seconds. In case of cache hits, you will observe a considerable reduction in latency.
- On the ElastiCache console, choose Redis clusters in the navigation pane.
- Choose Create Redis cluster.
- Disable Multi-AZ (for a production cluster, you should always enable it).
- Disable auto-failover (for a production cluster, you should always enable it).
- For Node type, choose cache.t2.small.
- For Number of replicas, choose 0 (for a production cluster, always set it to greater than 0).
- For Subnet group, select Create new subnet group.
- For Subnet group name, enter a name.
- For VPC ID, choose the VPC in which you have the bastion host.
- Leave all other properties as default and create the cluster.
Now you can create a table and populate it with some data.
- Connect to your PostgreSQL instance using psql:
- Set up the Python 3.x environment on your bastion host:
- Install Python 3:
- Install packages for Redis and PostgreSQL:
- Copy and enter the code in a file (pgcache.py):
Now you can run the Python code and compare the performance.
- Set up the environment variables in the bastion host shell:
- Run the Python code:
- Run it a couple of times and observe the difference in latency.
The cache TTL is 30 seconds, so you should see a big spike every 30 seconds. Feel free to change the TTL in the code and try again. - When you’re done with the experimentation, perform the following cleanup steps:
- Delete the ElastiCache for Redis cluster from the ElastiCache console.
- Set the Create backup option to No.
- Drop the table using psql:
- Delete the ElastiCache for Redis cluster from the ElastiCache console.
Tip #9: Secure your DB credentials with AWS Secrets Manager
Applications access the databases and other resources using credentials. Typically, these credentials are managed in the application in the form of property files, environment variables, or some other configuration management system. These traditional methods require the database owners to create the credentials and hand over these credentials to the application development teams. This mechanism has multiple shortcomings, such as a high chance of abuse of credentials, the rotation of the credentials (passwords) requires manual intervention, and hand-offs and application changes are slow.
AWS Secrets Manager integrates with Aurora to address all of the challenges associated with database credentials management in the application. Secrets Manager manages the credentials in the form of encrypted strings that are accessible only to authorized principals (IAM roles and users). Secrets are encrypted, and an IAM principal must have permission to use the encryption key for decryption of the secret. You can set up the password (stored as a secret) with automatic rotation, which automatically rotates the password based on a schedule set by the user. The rotation of the password is carried out by an automatically created Lambda function that has permission to access the database and the secret.
The following diagram illustrates this architecture.
It’s highly recommended that you manage the Aurora credentials in Secrets Manager because it makes your application and the database more secure from a database perspective.
Let’s try out Secrets Manager. In this exercise, you create a secret for your Aurora cluster. The secret stored in the Secrets Manager is read using Python code.
- On the Secrets Manager console, choose Secrets in the navigation pane.
- Choose Store a new secret.
- For Secret type, select Credentials for Amazon RDS database.
- Under Credentials, enter the user name and password for your PostgreSQL database.
- Under Database, select the Aurora PostgreSQL cluster.
- Choose Next.
- Set the secret name as test/auroratips/postgres.
- Optionally, set the secret rotation option.
For this post, we don’t rotate the PostgreSQL user password. As a best practice, always use this option for your production infrastructure.
- Set up the Python 3 environment in a bastion host shell:
- Set up the environment variables in a bastion shell (these are used from sample code):
- Copy and enter the code to a file on the bastion host (getsecret.py):
We use this secret in the next exercise. If you don’t want to try out the next exercise, then follow the cleanup steps to delete the secret; otherwise, move on to the next tip.
- Delete the secret from the Secrets Manager console.
Secrets aren’t deleted immediately. They’re scheduled to be deleted between 7–30 days. This is to prevent failure of apps that may still be using the secret. You can set the waiting period to 7 days for your secret and then delete it.
Tip #10: Use Amazon RDS Proxy for scalability, performance, and resilience
Applications take advantage of database connection pooling for better performance and for sharing connections within the application process. Such applications when deployed in an auto scaling compute infrastructure such as containers may scale (out and in) based on the load. This scaling behavior leads to creating and destroying database connection pools, which may put a strain on the database server resources. Because each connection pool is isolated and not shared by the application instances, it leads to an inefficient use of database connection resources. Serverless applications built with Lambda functions are unable to take advantage of connection pooling due to their transient nature.
Amazon RDS Proxy is a fully managed, highly available service that manages the RDS database connections pools. It’s supported for Amazon Aurora MySQL-Compatible Edition, Amazon Aurora PostgreSQL-Compatible Edition, Amazon RDS for MariaDB, Amazon RDS for MySQL, Amazon RDS for PostgreSQL, and Amazon RDS for SQL Server.
The DB connections managed by RDS Proxy are shared across multiple applications (or instances of the same application), achieving higher scalability. RDS Proxy returns the connection to the pool for reuse after each transaction; this is referred to as multiplexing. This mechanism leads to better resource usage on the database instances.
RDS Proxy continuously monitors the RDS database for failover. When a failover occurs, it creates connections to the new DB instance. This behavior is transparent to your application and may reduce the application recovery time from failover by up to 66%.
To access RDS Proxy, your application role requires access to the proxy instance. As a result, it offers an additional layer of security for access to the database. Applications can use RDS Proxy with no code changes. A proxy instance for Aurora exposes two endpoints: one for the writer and another for the readers. Keep in mind that if the cluster doesn’t have any read replicas, the connection to the proxy reader endpoint fails.
The following diagram illustrates this architecture.
It’s recommended that you consider RDS Proxy for your applications, especially for serverless apps.
Let’s try RDS Proxy. Complete the following steps:
- On the Amazon RDS console, choose Proxies in the navigation pane.
- Choose Create proxy.
- For Engine family, choose PostgreSQL.
- For Proxy identifier, enter
aurora-dev-tips
. - For Database, choose your Aurora database.
- For Secrets Manager secret, choose the secret you created earlier.
- For Subnets, select the same subnets as your database subnet group.
- For VPC security group, select the security group allowed to connect to the DB instance.
- After the RDS Proxy instance is created, copy the endpoint from the proxy details page.
- To connect to the Aurora DB instance using the proxy endpoint, SSH to your bastion host:
You are prompted to provide the password. Because we set up a Secrets Manager secret with no rotation, you may enter the original password for the user. In the application code, you always get the secret from the Secrets Manager. After you have connected with the DB instance from psql, you should be able to run SQL queries. Keep in mind that in order to use the proxy reader endpoint, you need to have at least one read replica in the cluster.
- When you’re done with experimentation, follow the steps to clean up the environment:
- Delete the RDS Proxy instance from the Amazon RDS console.
- Delete the secret from the Secrets Manager console.
As mentioned earlier, secrets aren’t deleted immediately; they’re scheduled to be deleted between 7–30 days. This is to prevent failure of apps that may still be using the secret. Set the waiting period to 7 days for your secret and then delete it.
Clean up
Make sure to delete the Aurora DB cluster and associate resources if you don’t need the cluster anymore.
Conclusion
In this second post of a two-part series, we covered tips for using AWS services in conjunction with Aurora. You can build feature-rich applications by using the Aurora integration with Lambda, which allows you to invoke Lambda functions from the SQL code. Aurora also integrates with SageMaker and Amazon Comprehend for NLP, which means that you can invoke trained ML models directly from the SQL code. With ElastiCache, you can achieve better query performance for your application and reduce the resource pressure on database instances. RDS Proxy manages database connection pools that are multiplexed. You can use RDS Proxy from apps without any code changes for better performance, faster recovery from failover, and improved app security posture.
Review your applications to see how you can take advantage of these AWS service integrations with Aurora database clusters.
About the Author
Rajeev Sakhuja is a Solution Architect based out of New York City. He enjoys partnering with customers to solve complex business problems using AWS services. In his free time, he likes to hike, and create video courses on application architecture & emerging technologies; check him out on Udemy.