Select the right encryption options for Amazon RDS and Amazon Aurora database engines
Customers are increasingly choosing to encrypt their AWS Cloud databases and data stores by default. This trend is only gaining speed with the evolving meaning of sensitive data (personally identifiable information [PII], etc.) within various regulatory frameworks. Customers also ask AWS for guidance on how to best adopt the latest database encryption options, while maintaining performance and quick iteration on their products.
This post discusses the range of encryption options available in the Amazon RDS MySQL, MariaDB, and Amazon Aurora MySQL database engines. It is intended to help you choose the right options appropriate for your workload and business needs. For simplicity, I refer to these engines as “RDS MySQL-related” engines, or “MySQL-related” engines.
Encryption options available in RDS can fall into in three categories:
- Encryption options for data at rest. These include platform-wide capabilities as well as features of the database engine itself.
- Encryption options for data in transit. These ensure protection of data while it is traveling over the network between the database and clients.
- Encryption options for data in transit during replication. Although not much different than the preceding category, these options are specific for data replication traffic between database servers.
I review each of these categories and suggest how to implement them most effectively, according to your needs.
Encryption of data at rest
Many AWS customers using RDS MySQL-related database engines rely on encrypting RDS resources. With RDS-encrypted resources, data is encrypted at rest, including the underlying storage for a database (DB) instance, its automated backups, read replicas, and snapshots. This capability uses the open standard AES-256 encryption algorithm to encrypt your data, which is transparent to your database engine.
For RDS MySQL and MariaDB, Amazon EBS provides the underlying storage and snapshot capability. Aurora uses a purpose-built, distributed, and log-structured storage service. Encrypted Aurora DB clusters enable you to encrypt data persistently stored by the storage service, along with associated backups stored in Amazon S3.
This encryption option protects against physical exfiltration or access of your data bypassing the DB instances. It is therefore critical to complement encrypted resources with an effective encryption key management and database credential management practice to mitigate any unauthorized access. Otherwise, compromised credentials or insufficiently protected keys might allow unauthorized users to access the plaintext data directly through the database engine.
Encryption key management is provided using the AWS KMS, which allows you to create encryption keys and define the policies that control the use of these keys. RDS MySQL-related engines can use two types of keys:
- The default key
- AWS KMS customer master keys (CMKs)
A default key is available with each AWS account and enables a one-press solution for encrypting your RDS resources. It is easy to use but does not allow you to take any advanced credential management actions, such as rotation, revocation, or deletion. The default key is ideal for customers that do not want to take on the burden of key management, preferring to delegate that responsibility to AWS.
CMKs, in contrast, provide you with full control of the lifecycle of the encryption keys, access controls, and audit-ability, but require additional management overhead on your part.
Encrypted resources provide an additional layer of security to your backups and snapshots. However, you may need to take extra steps when you share snapshots, restore DB instances and clusters from encrypted snapshots, or copy snapshots to another Region. For example, snapshots using the default key cannot be shared directly with other AWS accounts. First, you copy the snapshot and then change the key to a CMK.
Sharing a CMK-encrypted snapshot also requires you to grant the target AWS account access to the CMK used by the snapshot. During copy operations, the data remains encrypted as RDS employs envelope encryption, where individual data keys that physically encrypt the data are themselves encrypted using the indicated KMS key.
Encrypted resources leverage the hardware acceleration provided by modern CPUs (AES-NI or newer instruction set versions). The storage level implementation ensures that the performance overhead of enabling encryption is negligible.
The following example illustrates the impact, by running a sysbench OLTP read/write benchmark on a single node Aurora MySQL DB cluster of r4.16xlarge size, using 2,000 concurrent connections. The following chart represents more than one hour of average runtime, tracking the Select Latency (reads) and DML Latency (writes) Amazon CloudWatch metrics of a KMS-encrypted versus not-encrypted database cluster:
The average query latency differs only slightly. Write latencies are about 0.027 milliseconds slower for encrypted clusters in this test. The equivalent effects of encryption on reads are typically reduced or obscured by page caching in the buffer pool. This test found reads 0.001 milliseconds slower, with a high buffer pool hit ratio, close to 100%. Keep in mind that this is a synthetic test. Results may vary for your workload, based on many factors.
Although they are popular, RDS-encrypted resources are not the only way to implement data encryption at rest. Some use cases require the implementation of encryption at the logical layer—for example, to limit data access between different users. Many customers implement both, as they can serve different purposes. There are two ways to do so:
- Encryption of data in fields using encryption functions
- Encryption of data client-side before transmission to the database server
RDS MySQL-related database engines do not provide any built-in encryption key management capabilities. Managing the encryption function keys, therefore, becomes an implementation concern. The risk of exposure increases as the keys pass to the database engine—in SQL statements, logs, the engine’s process list, or other monitoring capabilities. The risk remains even if the client connections to the database are encrypted. Consequently, this approach might not meet all the data protection requirements applicable to you.
On the other hand, by performing encryption on the client side, you avoid the risk of inadvertently exposing keys in the database engine logging and monitoring features. Client-side encryption also removes from the database engine any burden of performing encryption/decryption operations.
Both approaches to logically encrypting the data introduce limitations to your data’s indexability and searchability. The database engine doesn’t store the plaintext values in the encrypted fields. The construction of indexes using the ciphertext values affects the ordering of your indexes and cardinality. Cyphertext indexing might impact execution plan decisions and thus performance. Similarly, database engines can at most evaluate encrypted fields in terms of “equal to” or “not equal to” predicates. Any other evaluations must be performed after data decryption.
As such, logical data encryption has generally been implemented only in narrow use cases, such as encrypting a person’s social security number (SSN) in a user data table. In that case, logical encryption limits access to the SSN to users with the required key. But search queries in the database are practically also limited to identifying users with a particular SSN only.
RDS does not support tablespace encryption.
Encryption of data in transit
RDS MySQL-related database engines allow you to establish Transport Layer Security (TLS) encrypted connections to the database engine. These connections are frequently also called Secure Sockets Layer (SSL) encryption, although that term specifically refers to the now-deprecated predecessor cryptographic protocol of TLS, which is not supported by RDS.
RDS creates an SSL certificate and installs the certificate on the DB instance when RDS provisions the instance. A certificate authority signs these certificates. The SSL certificate includes the DB instance endpoint as the common name (CN) for the SSL certificate to guard against spoofing attacks.
Aurora MySQL DB clusters also include the cluster endpoints as subject alternative names (SAN) in the certificate. Your database driver or client must support SAN to use cluster database endpoints for encrypted connections with Aurora and to be able to verify the server certificate or identity.
From a server-side perspective, MySQL-based engines enforce the requirement to use SSL connections at the database-user level. Ensure that any user accounts employing remote connections require the use of SSL. The specific command varies based on the version of the engine and is available in the respective engine’s documentation.
Clients can also require the use of SSL connections. AWS recommends verifying the server identity using the ssl_mode = VERIFY_IDENTITY option (‘ssl-verify-server-cert’ in older versions), to mitigate the risk of connecting to unintended server endpoints.
The version of TLS cryptographic protocol also varies with the database engine version, with RDS MySQL 5.6, older versions of MariaDB 10.0 and 10.1, and Aurora MySQL 5.6 only supporting TLS 1.0. Furthermore, the cryptographic library used by RDS changed, transitioning with newer versions of the database engines to OpenSSL from yaSSL. Newer versions using OpenSSL also support TLS 1.2. For detailed and up-to-date TLS support information, see the following topics:
- Using SSL with a MySQL DB Instance
- Using SSL with a MariaDB DB Instance
- Using SSL with Aurora MySQL DB Clusters
For RDS MySQL-related engine versions that support multiple versions of TLS (such as MySQL 8.0), you can use the tls_version parameter in the DB instance parameter group to indicate the permitted protocol versions. Similar client parameters exist for most client tools or database drivers. By default, the database engine attempts to use the highest TLS protocol version allowed by both the server and client configuration.
Establishing encrypted database connections involves overhead, both in terms of compute resources and response latency to first query. However, the newer engine versions, using the OpenSSL cryptographic library, offer lower overhead.
To illustrate the impact of SSL connection overhead, factoring in the cryptographic library, I ran a test establishing 100,000 distinct connections using 128 client threads. I measured the client latency starting with connection establishment through running a generic ‘SELECT 1;’ query. I used the r4.16xlarge instance class for testing. I repeated the test three times each for RDS MySQL 5.7, MariaDB 10.2, and Aurora MySQL 5.7 and averaged the results (shorter bars are better):
As you can see, connection latencies are more than twice as high for encrypted connections. Also, connection latencies for encrypted connections to older engine versions bundled with the yaSSL cryptographic library are three times higher than newer versions using OpenSSL. Keep in mind that this is a synthetic test. The results may vary for your workload based on concurrency, number of connections, instance class, and other factors.
The following types of applications may see an outsized impact due to the encryption overhead:
- Applications that can’t leverage connection pools.
- Applications that require frequent draining of connection pools.
- Applications that aggressively attempt reconnects when query response latency increases above certain thresholds.
Some of these behaviors don’t follow best practices. But in many cases, legacy applications are hard to change. To implement SSL connections for such use cases, test and plan for enough compute capacity to handle spikes in connection requests. Also, set the max_connections and max_user_connections parameters in the DB instance parameter group to a reasonable value. That is, set these values slightly above your highest expected peak in concurrent connections, but low enough to act as a safety, preventing runaway processes from storming the database with connection requests.
Encryption of data in transit, during replication
RDS MySQL-related database engines enable ongoing logical data replication using the binary log (binlog). Replica servers connect to the master (source) just like any other MySQL client, and issue commands requesting binlogs from the master. With RDS MySQL-related engines, binlog-based replication is available in two forms:
- RDS-managed read replicas, both within the same Region (same database subnet group), or cross-region read replicas.
- Manual, externally configured binlog replication.
RDS-managed read replicas enable read scaling, and cross-region DR use cases. With this capability, the replication is managed and monitored by AWS. Data is replicated across Regions using a secure communications channel between the source DB instance and the read replica.
RDS establishes any AWS security configurations needed to enable the secure channel, such as adding security group entries. Aurora MySQL uses a different mechanism to provide readable replicas within the same database cluster in a given Region, which does not rely on binlogs. However, it does offer the capability to create binlog-based read replica clusters in different Regions.
With manual, externally configured binlog replication, you are responsible for configuring, managing, and monitoring the binlog replication, as well as providing the network connectivity between the different servers involved in the replication topology.
Aurora MySQL 5.6 allows you to establish an SSL encrypted connection for replication from an Aurora MySQL replica, to an external MySQL master hosted on-premises, or deployed on Amazon EC2. This capability is enabled using a service-provided, stored procedure that allows you to import the SSL key material from the MySQL master server, on the Aurora cluster:
This capability is not available for RDS MySQL or MariaDB, nor is the key material for a master running in RDS or Aurora accessible to you. This restricts the use of SSL encrypted binlog replication for other use cases than the one described earlier.
In many cases, it is possible to work around this limitation by sending unencrypted binlog traffic over a VPN connection. Alternatively, if you require full end-to-end encryption of replication traffic, you might find a feasible alternative in AWS DMS, which offers encrypted connections to both source and target endpoints.
Aurora MySQL also provides a data replication mechanism independent of logical replication and binlogs. Available in Aurora Global Database, this feature relies on the replication at the physical layer of the redo log stream across AWS Regions. This technology provides for a superior replication experience, capable of handling a higher throughput of changes, with a lower and more predictable replication lag.
Aurora global databases use dedicated infrastructure to replicate your data, leaving database resources available entirely to serve application workloads. This dedicated infrastructure also automatically encrypts network traffic between the Regions to protect your data.
In this post, I reviewed the capabilities of RDS to encrypt data at rest and in transit for the MySQL and MariaDB managed database engines, as well as the Aurora database engine. I also highlighted important considerations regarding implementation, management, and performance of encrypted resources.
About the Author
Vlad Vlasceanu is a specialist solutions architect at Amazon Web Services. He works with our customers to provide guidance and technical assistance on database projects, helping them improving the value of their solutions when using AWS.