AWS Database Blog

Amazon Aurora as an Alternative to Oracle RAC

Written by David Yahalom, CTO and co-founder of NAYA Tech—a leading database, big data, and cloud professional and consulting service provider, located in San Jose, CA. David is a certified Oracle, Apache Hadoop, and NoSQL database expert and a cloud solutions architect.

Oracle Real Application Clusters (Oracle RAC) is considered to be one of the most advanced and capable technologies for enabling a highly available and scalable relational database. It is considered the default go-to standard for creating highly available and scalable Oracle databases.

However, with the ever increasing adoption of cloud, open-source, and platform-as-a-service (PaaS) database architectures, many organizations are searching for their next-generation relational database engine. They’re looking for one that can provide similar levels of high availability and scalability to Oracle RAC, but in a Cloud/PaaS model, while maintaining the freedom of open source software.

In this post, I discuss how Amazon Aurora can serve as a powerful and flexible alternative to Oracle RAC. Both Oracle RAC and Amazon Aurora are designed to provide increased high availability and performance scalability for your databases. But they approach these goals from very different directions using different architectures:

  • Oracle RAC uses an intricate end-to-end software stack developed by Oracle—including Oracle Clusterware and Grid Infrastructure, Oracle Automatic Storage Management (ASM), Oracle Net Listener, and Oracle Database itself—combined with enterprise-grade storage that enables a shared-everything database cluster technology.
  • Aurora simplifies the database technology stack by using AWS ecosystem components to transparently enable higher availability and performance for MySQL and PostgreSQL databases.

So let’s dive right in!

Preface
I’ll start this blog post with a quick disclaimer. I’m what you would call a “born and raised” Oracle database administrator (DBA). My first job, going back 15 years, was as an Oracle DBA, where I was responsible for administration and developing code on production Oracle 8 (sans i) databases. Since then, I’ve had the opportunity to work as a database architect and administrator with all Oracle versions up to and including the latest Oracle 12c Release 2. Throughout my career, I’ve delivered a lot of successful projects using Oracle as the relational database component.

I’d also like to preface this post by saying that I love the capabilities, features, technology, and power of Oracle Database. There’s really no denying that Oracle is still one of the most (if not the most) powerful and advanced relational databases in the world. Its place in the pantheon of database kings is undoubtedly safe.

The paradigm shift
With that introduction out of the way, let’s talk about how the database industry is changing and why many customers choose to think outside the box and adopt cloud-based solutions as alternatives for commercial databases. Even applications that require the highest levels of performance and availability or those that run on high-end commercial databases (such as Oracle RAC) can now be safely powered by MySQL and PostgreSQL databases—albeit with some Amazon “special sauce” on top. But more on that later.

There’s a fundamental shift happening in the database landscape as customers transition their data architectures from being monolithic—where a single, big, and feature-full relational database powers their entire solution stack—to a more microservices-oriented model. In this model, different databases (some relational and some NoSQL) provide services to parts of the solution. This “best-of-breed” approach makes perfect sense as emerging database technologies become more mature, robust, and feature-rich.

In addition, there’s a huge upward trend in the adoption of cloud services. Even the most traditional organization can see the benefits of tapping into the power that cloud-centric architectures can provide in performance, flexibility, high availability, and reduced total cost of ownership (TCO).

When you combine that with the huge and rapidly increasing adoption of open source–based relational databases (mainly MySQL and PostgreSQL), you can clearly see that the database industry is moving in a different trajectory from the classic “one commercial database to rule them all” approach that was once common.

This combination of rapid advancements in open source relational databases and the new breed of cloud database solutions can help customers consider alternatives to commercial Oracle RAC databases, including Amazon Aurora.

Now, let’s dig deeper…

Oracle RAC architecture
My first step is to flesh out the major benefits that Oracle Real Application Clusters (Oracle RAC) provides to its customers. Since this is just a blog post and not a full-blown research paper, I can’t cover every single aspect and benefit of Oracle RAC. As such, I primarily focus on the standard and most widely used capabilities of the technology.

Oracle RAC is one of the major differentiating features for an Oracle database when compared to other relational database technologies. Oracle RAC is the technology that allows an Oracle database to provide increased levels of high availability and big performance benefits.

Oracle RAC is an Active/Active database cluster, where multiple Oracle database servers (running Oracle Database instances, which are a collection of in-memory processes and caches) access a shared storage device that contains a single set of disk-persistent database files. This architecture is considerably different from what you usually find in a non-Oracle database cluster. Instead of each database node having its own dedicated storage, all the database nodes coordinate and share access to the same physical disks.

All the database nodes coordinate with one another using both a dedicated network-based communication channel (known as the cluster interconnect) and a set of disk-based files.

Public access to the cluster (from incoming applications, SQL queries, users, etc.) is performed using a set of SCAN IPs that are used to load-balance incoming sessions. In addition, each RAC cluster node has its own physical and virtual IP addresses that can be used to open a connection directly to a specific node.

Greatly simplified Oracle RAC architecture

Because of the shared nature of the RAC cluster architecture—specifically, having all nodes write to a single set of database data files on disk—the following two special coordination mechanisms were implemented to ensure that the Oracle database objects and data maintain ACID compliance:

  • GCS (Global Cache Services) tracks the location and the status of the database data blocks and helps guarantee data integrity for global access across all cluster nodes.
  • GES (Global Enqueue Services) performs concurrency control across all cluster nodes, including cache locks and the transactions.

These services, which run as background processes on each cluster node, are essential to serialize access to shared data structures in the Oracle database.

Shared storage is another essential component in the Oracle RAC architectures. All cluster nodes read and write data to the same physical database files stored in a disk that is accessible by all nodes. Most customers rely on high-end storage hardware to provide the shared storage capabilities required for RAC.

In addition, Oracle provides its own proprietary software-based storage/disk management mechanism called Automatic Storage Management, or ASM. ASM is implemented as a set of special background processes that run on all cluster nodes and allow for easier management of the database storage layer.

So, to recap, the main components of an Oracle RAC architecture include the following:

  • Cluster nodes: Set of one or more servers running Oracle Database instances, each with a collection of in-memory processes and caches.
  • Interconnect network: Cluster nodes communicate with one another using a dedicated “interconnect” network.
  • Shared storage: All cluster nodes access the same physical disks and coordinate access to a single set of database data files that contain user data. Usually handled by a combination of enterprise-grade storage with Oracle’s ASM software layer.
  • SCAN (Single Client Access Name): “Floating” virtual hostname/IPs providing load-balancing capabilities across cluster nodes. Naming resolution of SCAN to IP can be done via DNS or GNS (Grid Naming Service).
  • Virtual IPs (and Physical IPs): Each cluster node has its own dedicated IP address.

Performance and scale-out in Oracle RAC
With Oracle RAC, you can add new nodes to an existing RAC cluster without downtime. Adding more nodes to the RAC cluster increases the level of high availability that’s provided and also enhances performance.

Although you can achieve read performance easily by adding more cluster nodes, write performance is a more complex subject. Attempts from multiple sessions to modify rows that reside in the same physical Oracle block (the lowest level of logical I/O performed by the database) can cause write overhead for the requested block and affect write performance.

Suppose that you have two sessions that are connected to two RAC instances (A and B), and you try to modify the same Oracle data block. In this case, instance A creates a “past image” of the block in its own buffer cache, generates the redo entries, and transmits the block to the instance B via interconnect. This process can add overhead and contention under certain write conditions, and as such, scaling writes isn’t as linear as with scaling reads.

With this limitation in write scale-out, most Oracle RAC customers choose to split their RAC clusters into multiple “services,” which are logical groupings of nodes in the same RAC cluster. By using services, you can use Oracle RAC to perform direct writes to specific cluster nodes. This is usually done in one of two ways (as shown in the diagram that follows):

  1. Splitting writes from different individual “modules” in the application (that is, groups of independent tables) to different nodes in the cluster. This is also known as “application partitioning” (not to be confused with database table partitions).
  2. In extreme concurrency cases, directing all writes to a single RAC node and load-balancing only the reads.

Major benefits of Oracle RAC
To recap, Oracle Real Application Clusters provides two major benefits that drive customer adoption:

  • Multiple database nodes within a single RAC cluster provide increased high availability. No single point of failure exists from the database servers themselves. However, the shared storage requires storage-based high availability solutions.
  • Multiple cluster database nodes allow for scaling-out query performance across multiple servers.

Amazon Aurora architecture
Aurora is Amazon’s flagship cloud database solution. When creating Amazon Aurora cluster databases, you can choose between MySQL and PostgreSQL compatibility.

Aurora extends the “vanilla” versions of MySQL and PostgreSQL in two major ways:

  • Adding enhancements to the MySQL/PostgreSQL database kernel itself to improve performance (concurrency, locking, multithreading, etc.)
  • Using the capabilities of the AWS ecosystem for greater high availability, disaster recovery, and backup/recovery functionality

Aurora adds these enhancements without affecting the database optimizer and SQL parser. This means that these changes are completely transparent to the application. If you provision an Aurora cluster with MySQL compatibility, as the name suggests, any MySQL-compatible application can function.

When comparing the Amazon Aurora architecture to Oracle RAC, you can see major differences in how Amazon chooses to provide scalability and increased high availability in Aurora. These differences are due mainly to the existing capabilities of MySQL/PostgreSQL and the strengths that the AWS backend can provide in terms of networking and storage.

Instead of having multiple cluster nodes access a shared disk, Aurora is a master/replica cluster. Whereas Oracle RAC uses a set of background processes to coordinate writes across all cluster nodes, Amazon Aurora uses a much simpler concept: asynchronous replication. (But on steroids!)

Let’s go deeper…

Each Aurora cluster can have one or more cluster nodes. All nodes serve different purposes:

  1. At any given time, a single node functions as the master that handles both writes and reads from your applications.
  2. In addition to the master, up to 15 replicas can be created, which are used for two purposes:
  • For performance and read scalability: As read-only nodes for queries/report-type workloads.
  • For high availability: As failover nodes in case the master fails. Each replica can be located in a different AWS Availability Zone. A single Availability Zone can host more than one Aurora Replica.

The following is a high-level Aurora architecture diagram showing four cluster nodes: one master and three replicas. The master node is located in Availability Zone A, the first replica in Availability Zone B, and the third and fourth replicas in Availability Zone C.

Each database node in an Aurora cluster has its own region-local dataset copy, which is closer to a shared-nothing architecture compared to the Oracle RAC shared-everything architecture.

Because of this design, in Amazon Aurora there is no need for special processes to synchronize serialized access to data (akin to GCS/GES in Oracle RAC). Each Aurora cluster member essentially accesses a local copy of the data.

In addition, Oracle RAC uses a combination of server-side software storage management (ASM) and high-end enterprise storage devices that must be manually configured and actively maintained by database administrators and storage experts. Aurora uses the existing AWS storage ecosystem and hides most of the storage complexity from the customer. Furthermore, due to its shared-nothing architecture, Aurora is much simpler.

For providing failover capabilities for connecting applications, Aurora provides two “endpoints” (which very roughly translate to virtual IPs) for cluster access:

  • Cluster endpoint: Connects you to the primary instance for the DB cluster. You can perform both read and write operations using the cluster endpoint.
  • Reader endpoint: Provides load-balancing capabilities across the Aurora cluster, as it load-balances connections across the replicas in the DB cluster. This spreads the workload across your replicas and provides better use of the resources available in the cluster.

Performance and scale-out in Aurora
By using the cluster endpoint and the reader endpoint in Aurora, you can scale reads across the cluster. Writes are always funneled to the “master” node.

Although the capability to only scale reads might seem like a limitation when compared to Oracle RAC (which can scale both), it’s usually not a major issue for most types of applications for the following reasons:

  • Most relational database management system (RDBMS)-centric applications are usually more read-heavy. A mix of 70 percent reads/30 percent writes is normal for most applications.
  • Scaling read performance is often more crucial, especially when combining OLTP workloads (transactions) with analytical-type workloads, such as reports.
  • Even with Oracle RAC, which can scale writes to some extent, because of concurrency issues that can occur when multiple sessions try to modify rows in the same Oracle block, many customers choose to partition the read/write workload to specific nodes in their RAC cluster.

The reader endpoint also enhances high availability. If an AWS Availability Zone fails, the application’s use of the reader endpoint continues to send read traffic to the other replicas with minimal disruption.

Major benefits of Amazon Aurora
To summarize, Amazon Aurora provides the following benefits:

  • Multiple cluster database nodes provide increased high availability. There is no single point of failure from the database servers.
  • AWS managed storage nodes also provide high availability for the storage tier. A zero-data loss architecture is employed.
  • Multiple cluster database nodes allow for scaling-out query read performance across multiple servers.
  • Greatly reduced operational overhead using a cloud solution and reduced TCO by using AWS and open source database engines.
  • Automatic management of storage. No need to pre-provision storage for a database. Storage is automatically added as needed, and you only pay for one copy of your data.

The biggest benefits of adopting Amazon Aurora when compared to Oracle RAC include a reduced operational complexity (after all, it’s a cloud solution) and reduced TCO. You also gain high-availability capabilities and performance usually found only in commercial databases.

When comparing the Oracle RAC and Aurora architectures side by side, you can see that Aurora can greatly simplify the database deployment stack and thus reduce complexity and operational overhead.

Feature Oracle RAC Amazon Aurora
Storage Enterprise-grade storage + ASM Aurora Storage Nodes: Distributed, Low Latency, Storage Engine Spanning Multiple AZs
Cluster type

Active/Active

Shared-everything
·         All nodes open for R/W

Active/Active

Shared-nothing

·         Master node open for R/W
·         Replica nodes open for reads

Cluster virtual IPs R/W load balancing: SCAN IP R/W: Cluster endpoint
+
Read load balancing: Reader endpoint
Internode coordination Cache-fusion + GCS + GES Storage-based replication
Internode private network Interconnect Not required
Transaction (write) TTR from node failure 0–60 seconds Typically < 60 Seconds
Application (Read) TTR from node failure Immediate Immediate
Max number of cluster nodes 100 (theoretical) 15
Provides built-in read scaling Yes Yes
Provides built-in write scaling Yes (with limitations*) No
Data loss in case of node failure No data loss No data loss
Replication latency Milliseconds
Operational complexity Substantial Low
Scale-up nodes Difficult Easy using the AWS UI/CLI
Scale-out cluster Provision, deploy, and configure new servers Easy using the AWS UI/CLI
Database engine Proprietary Open source

* Write performance can be limited and affect scale-out capabilities if multiple sessions try to modify rows contained in the same Oracle block or the same row.

Summary
With the increasing availability of cloud-based solutions, many organizations are looking for a relational database engine that can provide levels of high availability and scalability similar to Oracle RAC—but in a cloud/PaaS model. Amazon Aurora can serve as a powerful alternative solution, using AWS ecosystem components and open source database engines to greatly reduce complexity and operational overhead.