AWS Database Blog

10 Amazon Aurora tips for application developers – Part 2

This is the second post of the two-part series on how application developers can take advantage of Amazon Aurora features, capabilities, and complementary services. In Part 1, we shared the first five tips:

  1. Split the application SQL read/write load to enhance your app’s performance.
  2. For Java apps, use AWS JDBC drivers for faster app recovery from Aurora failover.
  3. Use cloning to test applications against production data.
  4. Use Amazon Aurora Global Database for disaster recovery and cross-Region high performance reads.
  5. Get consistent application performance at lower cost with Amazon Aurora Serverless v2.

In this post, we share the next five tips, which show you how to enhance performance, security, and availability of your apps by taking advantage of AWS services that directly integrate with or complement Aurora databases.

Prerequisites

Make sure to review the prerequisites in Part 1. Although the examples in this post are shown for Amazon Aurora PostgreSQL-Compatible Edition, they may be ported for use with Amazon Aurora MySQL-Compatible Edition.

Tip #6: Integrate with other AWS services using AWS Lambda

You can invoke an AWS Lambda function from Amazon Aurora MySQL-Compatible Edition and Amazon Aurora PostgreSQL-Compatible Edition. This capability is exposed to database objects such as stored procedures and triggers. Amazon Aurora MySQL-Compatible Edition exposes native functions (lambda_sync, lambda_async) to run Lambda functions.

Amazon Aurora PostgreSQL-Compatible Edition requires the aws_lambda extension to be installed on the cluster. You can then use the functions on the extension for invoking the Lambda function. You can set up the code in the Lambda function to integrate with any AWS service.

Lambda functions can be invoked synchronously or asynchronously from the SQL code. Synchronous calls block the SQL code from running until the Lambda function returns, which may have an impact on the application’s performance. If you decide to use synchronous calls to Lambda functions, thoroughly test your application for performance and timeouts. Asynchronous calls to Lambda functions are non-blocking, but the SQL code doesn’t receive the results from the function call. Synchronous calls are referred to as RequestResponse invocations, and asynchronous calls are referred to as Event invocations.

This feature can come handy when applications need to publish events on data changes. For example, consider a scenario in which when an insert is carried out in the orders table, and an email is sent to the customer confirming the order via Amazon Simple Email Service (Amazon SES). To achieve this, a trigger on insert on the orders table can invoke the Lambda function to send out an email.

If you decide to consider the Lambda invocation capability for your application, thoroughly test your application performance to ensure that the Lambda function invocation overhead (latency) is acceptable for your workload.

The following diagram illustrates this architecture.

Although the Lambda function may be outside of the database VPC, we recommend creating the Lambda function in the same VPC as your database instances. This simplifies the setup and provides better performance due to lower overheads.

To invoke a Lambda function from Aurora, complete the following steps:

  1. Ensure that the DB instances are created in a subnet with a NAT gateway, otherwise a VPC interface endpoint is needed for Lambda.
  2. Create a Lambda function in the same Region as the cluster. You may use the boilerplate Hello from Lambda sample.
    boiler plate lambda
  3. Set up an AWS Identity and Access Management (IAM) role to allow the DB instance to invoke the Lambda function:
    1. To create an IAM policy, run the following command from the bastion host shell:
      aws iam create-policy  --policy-name   aurora-dev-tips-policy --policy-document '{
          "Version": "2012-10-17",
          "Statement": [
              {
              "Sid": "AllowAccessToLambdaFunction",
              "Effect": "Allow",
              "Action": "lambda:InvokeFunction",
              "Resource": "<<REPLACE with Lambda function ARN>>"
              }
          ]
      }'
    2. To create the IAM role, run the following command from the bastion host shell:
      aws iam create-role  --role-name aurora-dev-tips-role --assume-role-policy-document '{
          "Version": "2012-10-17",
          "Statement": [
              {
              "Effect": "Allow",
              "Principal": {
                  "Service": "rds.amazonaws.com"
              },
              "Action": "sts:AssumeRole"
              }
          ]
      }'
    3. Attach the IAM policy to the role:
      aws iam attach-role-policy --policy-arn <<Replace with IAM Policy ARN>> --role-name aurora-dev-tips-role --region <<AWS Region>>
  4. Add the IAM role to the DB cluster for Lambda invocation (adjust the DB cluster identifier accordingly):
    aws rds add-role-to-db-cluster --db-cluster-identifier <<DB Cluster identifier>> --feature-name Lambda --role-arn  <<Replace with IAM Role ARN>> --region <<AWS Region>>
  5. Use the code for testing on PostgreSQL:
    1. Install the aws_lambda extension:
      psql => CREATE EXTENSION IF NOT EXISTS aws_lambda CASCADE;
    2. Run the aws_lambda.invoke(..) function to invoke the Lambda function:
      psql => SELECT * from aws_lambda.invoke(aws_commons.create_lambda_function_arn('<<Replace with Lambda function ARN>> ', '<<AWS Region>>'), '{"body": "Hello from Postgres!"}'::json );
  6. When you’re done experimenting, clean up the environment:
    1. Delete the Lambda function.
    2. Remove the role from the database:
      aws rds remove-role-from-db-cluster --db-cluster-identifier aurora-pg-dev-tips --role-arn <<Replace with Role ARN>>
    3. Detach the policy from the role:
      aws iam detach-role-policy  --role-name   aurora-dev-tips-role --policy-arn  <<Replace with Policy ARN>>
    4. Delete the role:
      aws iam delete-role --role-name aurora-dev-tips-role
    5. Delete the policy:
      aws iam delete-policy --policy-arn <<Replace with Policy ARN>>

Tip #7: Run machine learning on transactional data and run predictions

Amazon SageMaker enables you to build, train, and deploy custom machine learning (ML) models quickly. Amazon Comprehend is a natural language processing (NLP) service that uses ML models to find insights in text. You can use Amazon Comprehend to determine the sentiment of content in UTF-8 encoded text documents.

Aurora integrates directly with SageMaker ML and Amazon Comprehend for sentiment analysis. Amazon Aurora MySQL-Compatible Edition exposes native functions for using this capability from SQL code in stored procedures and triggers. Amazon Aurora PostgreSQL-Compatible Edition requires the aws_ml extension to be installed on the cluster.

Consider a scenario in which you want to run sentiment analysis on some data in the transactional database. For example, you can use sentiment analysis to determine the sentiments of comments on a blog post to determine if your readers liked the post. One way to do it would be to export the data out of the database and then run ML training on the data to create an ML model for predictions. You can then use the model from the application code to do the sentiment analysis. With the Aurora integration with Amazon Comprehend, you can use a pre-trained model to do the sentiment analysis on the data in the database. The integration saves you on the ML training and doesn’t require any ML skills.

The following diagram illustrates this architecture.

tip 8

Let’s try out sentiment analysis on data in a table. Complete the following steps:

  1. Ensure that DB instances are created in a subnet with a NAT gateway, otherwise a VPC interface endpoint is needed for Amazon Comprehend.
  2. Set up an IAM role to allow the DB instance to invoke the Lambda function:
    1. Create a policy to invoke Amazon Comprehend features:
      aws iam create-policy  --policy-name   aurora-dev-tips-comprehend-policy --policy-document '{
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "AllowAuroraToInvokeComprehendDetectSentiment",
            "Effect": "Allow",
            "Action": [
              "comprehend:DetectSentiment",
              "comprehend:BatchDetectSentiment"
            ],
            "Resource": "*"
          }
        ]
      }'
      
    2. Create a role that is assumed by Aurora:
      aws iam create-role  --role-name aurora-dev-tips-comprehend-role --assume-role-policy-document '{
          "Version": "2012-10-17",
          "Statement": [
              {
              "Effect": "Allow",
              "Principal": {
                  "Service": "rds.amazonaws.com"
              },
              "Action": "sts:AssumeRole"
              }
          ]
      }'
      
    3. Attach the policy to the role:
      aws iam attach-role-policy --policy-arn <<Replace with Policy ARN>> --role-name aurora-dev-tips-comprehend-role --region <<Replace with AWS Region>>
  3. Add the IAM role to the DB cluster for Lambda invocation (adjust the DB cluster identifier accordingly):
    aws rds add-role-to-db-cluster --db-cluster-identifier aurora-pg-dev-tips --feature-name Comprehend --role-arn <<Replace with Role ARN>>   --region <<Replace with Region>>
  4. Use the code for testing on PostgreSQL:
    1. Create a test table using psql:
      CREATE TABLE IF NOT EXISTS comments (
             comment_id SERIAL PRIMARY KEY,
             comment_text VARCHAR(255) NOT NULL
      );
      
    2. Populate the test table with some comments data:
      INSERT INTO comments(comment_text) VALUES("This is very useful, thank you for writing it!");
      INSERT INTO comments (comment_text) VALUES ("I don’t like how this was implemented.");
      INSERT INTO comments (comment_text) VALUES ("Its OK.");
  1. Invoke sentiment analysis on the data in the table; pay attention to the confidence level:
    SELECT comment_text, aws_comprehend.detect_sentiment(comment_text, 'en') AS sentiment  FROM comments;
  2. When you’re done experimenting, clean up the environment:
    1. Remove the role from the database:
      aws rds remove-role-from-db-cluster --db-cluster-identifier aurora-pg-dev-tips --role-arn <<Replace with Role ARN>>
    2. Detach the policy from the role:
      aws iam detach-role-policy  --role-name aurora-dev-tips-comprehend-role --policy-arn  <<Replace with Policy ARN>>
    3. Delete the role:
      aws iam delete-role --role-name aurora-dev-tips-comprehend-role
    4. Delete the policy:
      aws iam delete-policy --policy-arn <<Replace with Policy ARN>>

Tip #8: Use Amazon ElastiCache for better performance

With relational databases, there comes a point when further improvement of performance may be costly or not even possible. Amazon ElastiCache is a managed caching service that supports Memcached and Redis. You can use ElastiCache as a caching layer for relational as well as NoSQL databases to achieve lower latency for data reads. An added benefit of using ElastiCache in front of Aurora is that it may also reduce the cost of I/O associated with the database.

The following diagram illustrates this architecture.

tip# 8

In general, consider using ElastiCache for workloads that need low latency for reads, have complex joins or calculations that are resource intensive, and may be materialized in a cache layer. With ElastiCache, you have two choices for your cache layer: Memcached is an in-memory only cache, and Redis also offers durability. The implementation of the caching layer involves first checking the cache for requested data and serving it from the cache if it’s available; otherwise, you read from the database and add to the cache for faster subsequent requests for the same data.

Let’s try out ElastiCache for Redis with Amazon Aurora PostgreSQL-Compatible Edition. In this test, you use Redis’s in-memory key-value storage feature. You may add key-value data to the Redis cluster for faster reads. Each of the key-value pairs has a time to live (TTL) attribute that specifies the number of seconds for which the key-value pair is available for reads.

In this test, we create a table and populate it with 10 million rows. We run SELECT on the table to get a count of rows with numbers that are divisible by 3. The count is cached in the Redis cluster with a TTL of 30 seconds. In case of cache hits, you will observe a considerable reduction in latency.

  1. On the ElastiCache console, choose Redis clusters in the navigation pane.
  2. Choose Create Redis cluster.
    choose redis
  3. Disable Multi-AZ (for a production cluster, you should always enable it).
  4. Disable auto-failover (for a production cluster, you should always enable it).
  5. For Node type, choose cache.t2.small.
  6. For Number of replicas, choose 0 (for a production cluster, always set it to greater than 0).
  7. For Subnet group, select Create new subnet group.
  8. For Subnet group name, enter a name.
  9. For VPC ID, choose the VPC in which you have the bastion host.
  10. Leave all other properties as default and create the cluster.

Now you can create a table and populate it with some data.

  1. Connect to your PostgreSQL instance using psql:
=> CREATE TABLE redis_test(id int);
=> INSERT INTO redis_test(id) SELECT * FROM generate_series(1,10000000) AS id;
  1. Set up the Python 3.x environment on your bastion host:
    1. Install Python 3:
      sudo yum install python3 -y
    2. Install packages for Redis and PostgreSQL:
      pip3 install redis
      pip3 install psycopg2-binary
    3. Copy and enter the code in a file (pgcache.py):
      # Package dependencies - pip3 install psycopg2-binary redis
      # Assumes table exists & has data
      import psycopg2
      import redis
      import os
      import sys
      import time
      import json
      # Setup the environment variables in terminal/shell 
      USER = os.getenv('PG_USER')
      PASSWORD = os.environ.get('PG_PASSWORD')
      DATABASE = os.environ.get('PG_DATABASE')
      HOST = os.environ.get('PG_HOST')
      REDIS_URL = os.environ.get('REDIS_URL')
      # Setup time to live in seconds
      REDIS_TTL = 30
      
      # Create the connection to the database
      conn = psycopg2.connect(
          host= HOST,
          database = DATABASE,
          user = USER,
          password = PASSWORD
      )
      # Create a connection to Redis cluster
      cache = redis.Redis.from_url(REDIS_URL)
      
      # Function that gets count of numbers divisible by 3
      def  get_count_divisble_by_3():
          sql = "SELECT count(*) FROM redis_test WHERE (id % 3) = 0"
          # get the count from cache
          # Using the sql statement as the key
          res = cache.get(sql)
          if res:
              print("Cache HIT")
              return res
          # Not found in cache
          print("Cache MISS")
          # Get the count from database table
          cur = conn.cursor()
          cur.execute(sql)
          rows = cur.fetchall()
          count=json.dumps(rows)
          # Add to cache with key=sql statement & value=count
          cache.setex(sql, REDIS_TTL, count)
          return count
      # Time the function call
      start = time.time()
      rows = get_count_divisble_by_3()
      end = time.time()
      print("Time taken in milliseconds = ",round((end - start)*1000)," ms")
      # Close connections
      cache.close()
      conn.close()
      

Now you can run the Python code and compare the performance.

  1. Set up the environment variables in the bastion host shell:
    export  PGUSER=<<Your PostgreSQL user name>>
    export  PGPASSWORD=<<Your PostgreSQL password>>
    export  PGDATABASE=<<Your PostgreSQL database>>
    export PGHOST=<<Cluster endpoint for your Aurora cluster>>
    export  REDIS_URL=redis://<<Copy & paste Redis endpoint>>
  2. Run the Python code:
    python3 pgcache.py
  3. Run it a couple of times and observe the difference in latency.
    The cache TTL is 30 seconds, so you should see a big spike every 30 seconds. Feel free to change the TTL in the code and try again.
  4. When you’re done with the experimentation, perform the following cleanup steps:
    1. Delete the ElastiCache for Redis cluster from the ElastiCache console.
      delete redis cluster
    2. Set the Create backup option to No.
    3. Drop the table using psql:
      => DROP TABLE redis_test;

Tip #9: Secure your DB credentials with AWS Secrets Manager

Applications access the databases and other resources using credentials. Typically, these credentials are managed in the application in the form of property files, environment variables, or some other configuration management system. These traditional methods require the database owners to create the credentials and hand over these credentials to the application development teams. This mechanism has multiple shortcomings, such as a high chance of abuse of credentials, the rotation of the credentials (passwords) requires manual intervention, and hand-offs and application changes are slow.

AWS Secrets Manager integrates with Aurora to address all of the challenges associated with database credentials management in the application. Secrets Manager manages the credentials in the form of encrypted strings that are accessible only to authorized principals (IAM roles and users). Secrets are encrypted, and an IAM principal must have permission to use the encryption key for decryption of the secret. You can set up the password (stored as a secret) with automatic rotation, which automatically rotates the password based on a schedule set by the user. The rotation of the password is carried out by an automatically created Lambda function that has permission to access the database and the secret.

The following diagram illustrates this architecture.

secret use process

It’s highly recommended that you manage the Aurora credentials in Secrets Manager because it makes your application and the database more secure from a database perspective.

Let’s try out Secrets Manager. In this exercise, you create a secret for your Aurora cluster. The secret stored in the Secrets Manager is read using Python code.

  1. On the Secrets Manager console, choose Secrets in the navigation pane.
  2. Choose Store a new secret.
  3. For Secret type, select Credentials for Amazon RDS database.
  4. Under Credentials, enter the user name and password for your PostgreSQL database.
  5. Under Database, select the Aurora PostgreSQL cluster.
  6. Choose Next.
  7. Set the secret name as test/auroratips/postgres.
  8. Optionally, set the secret rotation option.

For this post, we don’t rotate the PostgreSQL user password. As a best practice, always use this option for your production infrastructure.

  1. Set up the Python 3 environment in a bastion host shell:
    pip3 install boto3
  2. Set up the environment variables in a bastion shell (these are used from sample code):
    export  AWS_REGION=<<Set it to your region e.g., us-east-1>>
    export  SECRET_NAME= test/auroratips/postgres
    
  3. Copy and enter the code to a file on the bastion host (getsecret.py):
    # Sample shows how to get the secret 
    # Python dependency - pip3 install boto3
    import boto3
    import os
    import base64
    
    SECRET_NAME = os.getenv('SECRET_NAME')
    AWS_REGION = os.getenv('AWS_REGION')
    
    def get_secret():
        secret_name = SECRET_NAME 
        region_name = AWS_REGION
        # Create a Secrets Manager client
        session = boto3.session.Session()
        client = session.client(
            service_name='secretsmanager',
            region_name=region_name
        )
        # Get the secret
        try:
            get_secret_value_response = client.get_secret_value(
                SecretId=secret_name
            )
        except ClientError as e:
                raise e
        else:
            return get_secret_value_response['SecretString']
    # call function to get the secret
    your_secret = get_secret()
    
    # Put code for connecting with the database
    print(your_secret)

We use this secret in the next exercise. If you don’t want to try out the next exercise, then follow the cleanup steps to delete the secret; otherwise, move on to the next tip.

  1. Delete the secret from the Secrets Manager console.

Secrets aren’t deleted immediately. They’re scheduled to be deleted between 7–30 days. This is to prevent failure of apps that may still be using the secret. You can set the waiting period to 7 days for your secret and then delete it.
delete secret

Tip #10: Use Amazon RDS Proxy for scalability, performance, and resilience

Applications take advantage of database connection pooling for better performance and for sharing connections within the application process. Such applications when deployed in an auto scaling compute infrastructure such as containers may scale (out and in) based on the load. This scaling behavior leads to creating and destroying database connection pools, which may put a strain on the database server resources. Because each connection pool is isolated and not shared by the application instances, it leads to an inefficient use of database connection resources. Serverless applications built with Lambda functions are unable to take advantage of connection pooling due to their transient nature.

Amazon RDS Proxy is a fully managed, highly available service that manages the RDS database connections pools. It’s supported for Amazon Aurora MySQL-Compatible Edition, Amazon Aurora PostgreSQL-Compatible Edition, Amazon RDS for MariaDB, Amazon RDS for MySQL, Amazon RDS for PostgreSQL, and Amazon RDS for SQL Server.

The DB connections managed by RDS Proxy are shared across multiple applications (or instances of the same application), achieving higher scalability. RDS Proxy returns the connection to the pool for reuse after each transaction; this is referred to as multiplexing. This mechanism leads to better resource usage on the database instances.

RDS Proxy continuously monitors the RDS database for failover. When a failover occurs, it creates connections to the new DB instance. This behavior is transparent to your application and may reduce the application recovery time from failover by up to 66%.

To access RDS Proxy, your application role requires access to the proxy instance. As a result, it offers an additional layer of security for access to the database. Applications can use RDS Proxy with no code changes. A proxy instance for Aurora exposes two endpoints: one for the writer and another for the readers. Keep in mind that if the cluster doesn’t have any read replicas, the connection to the proxy reader endpoint fails.

The following diagram illustrates this architecture.

It’s recommended that you consider RDS Proxy for your applications, especially for serverless apps.

Let’s try RDS Proxy. Complete the following steps:

  1. On the Amazon RDS console, choose Proxies in the navigation pane.
  2. Choose Create proxy.
    choose proxy
  3. For Engine family, choose PostgreSQL.
  4. For Proxy identifier, enter aurora-dev-tips.
  5. For Database, choose your Aurora database.
  6. For Secrets Manager secret, choose the secret you created earlier.
  7. For Subnets, select the same subnets as your database subnet group.
  8. For VPC security group, select the security group allowed to connect to the DB instance.
  9. After the RDS Proxy instance is created, copy the endpoint from the proxy details page.
    copy endpoint
  10. To connect to the Aurora DB instance using the proxy endpoint, SSH to your bastion host:
psql  -U <<Provide PostgreSQL user name>>  -h  <<Paste proxy endpoint>>

You are prompted to provide the password. Because we set up a Secrets Manager secret with no rotation, you may enter the original password for the user. In the application code, you always get the secret from the Secrets Manager. After you have connected with the DB instance from psql, you should be able to run SQL queries. Keep in mind that in order to use the proxy reader endpoint, you need to have at least one read replica in the cluster.

  1. When you’re done with experimentation, follow the steps to clean up the environment:
    •  Delete the RDS Proxy instance from the Amazon RDS console.
    • Delete the secret from the Secrets Manager console.

As mentioned earlier, secrets aren’t deleted immediately; they’re scheduled to be deleted between 7–30 days. This is to prevent failure of apps that may still be using the secret. Set the waiting period to 7 days for your secret and then delete it.

Clean up

Make sure to delete the Aurora DB cluster and associate resources if you don’t need the cluster anymore.

Conclusion

In this second post of a two-part series, we covered tips for using AWS services in conjunction with Aurora. You can build feature-rich applications by using the Aurora integration with Lambda, which allows you to invoke Lambda functions from the SQL code. Aurora also integrates with SageMaker and Amazon Comprehend for NLP, which means that you can invoke trained ML models directly from the SQL code. With ElastiCache, you can achieve better query performance for your application and reduce the resource pressure on database instances. RDS Proxy manages database connection pools that are multiplexed. You can use RDS Proxy from apps without any code changes for better performance, faster recovery from failover, and improved app security posture.

Review your applications to see how you can take advantage of these AWS service integrations with Aurora database clusters.


About the Author

Rajeev Sakhuja is a Solution Architect based out of New York City. He enjoys partnering with customers to solve complex business problems using AWS services. In his free time, he likes to hike, and create video courses on application architecture & emerging technologies; check him out on Udemy.