AWS Database Blog

How to Migrate Your Data Warehouse to Amazon Redshift Using the AWS Schema Conversion Tool Data Extractors

Michael Soo is a database engineer at Amazon Web Services.

This blog post gives you a quick overview of how you can use the AWS Schema Conversion Tool (AWS SCT) data extractors to migrate your existing data warehouse to Amazon Redshift. Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Migrating your data warehouse to Amazon Redshift can substantially improve query and data load performance, increase scalability, and save costs. AWS provides the Schema Conversion Tool free of charge to help you migrate your legacy data warehouse to the cloud.

Customers usually use SCT to migrate their schema objects (table definitions, indexes, triggers, and other execution logic) from legacy data warehouse platforms. SCT has been enhanced also to migrate the actual data records. This feature, called the SCT data extractors, can migrate your data from any legacy platform (Oracle, Greenplum, Netezza, Teradata, Microsoft SQL Server, or Vertica), using a parallel, secure and fault-tolerant architecture.

Typically, there are two stages to migrating a data warehouse. In the first stage, a full copy of the legacy data is replicated to the target system. We call this stage the “full load.”  The full load can take a considerable amount of time depending on how much data there is to move. This being the case, any intervening changes are usually captured in a subsequent change data capture stage.

The SCT extractors are designed to facilitate the full load process. The extractors are designed with two key features to support the full load process:

  • Quickly move large numbers of records using multi-threaded data extraction and load processes
  • In case of failure, an extraction process can restart from the point of failure, rather than from the beginning of the entire data extract

In the second stage, SCT provides support for point-in-time data extracts so that “change deltas” since the full load can be captured and migrated in a second step.

Let’s take a look at architecture of the SCT extractors and how they can help you accomplish the full data load process.

Architecture
You can install the SCT extractors on any server machines with connectivity to your source data warehouse. The SCT client application issues commands to the extractors to pull data from the source system, upload the data to Amazon S3, and copy the data into Amazon Redshift. You can install multiple extractors on the same machine, or separate them as needed across many machines. Which is best depends on your available server resources and network capacities. In general, a best practice is to locate the data extractors close to your data warehouse to reduce network traffic and latency.

The extractors provide a fully managed framework for your data warehouse migration. The data extractors write their extract files to local storage. The extract files are compressed, using lossless LZO compression, then copied to an S3 bucket. The files are compressed to reduce network transfer time and to reduce storage costs on S3. After the files are uploaded to S3, the extractor triggers Amazon Redshift to copy the data into the target tables using the standard Amazon Redshift COPY command. Each of these steps can be repeated if necessary with no unexpected consequences on the target database.

For security purposes, SCT supports encrypted communication using Secure Socket Layer (SSL) between the SCT client and the extractors. If you want to learn more about using SSL with the extractors, we have a companion blog post to come to explain that process, How to Use SSL with the AWS Schema Conversion Tool Data Extractors. However, before moving on to SSL, we recommend that you read this post first to understand the overall extractor architecture and setup.

Now that we know what the architecture looks like, let’s take a look at the steps involved in setting up the data extractors.

The scenario we work through in this post is migrating an Oracle data warehouse to Amazon Redshift. We assume that you’ve already converted your source schema to Amazon Redshift and applied it on the target database. We won’t try to cover all possible migration scenarios—our intent is to give you a deep dive into the technology so you can confidently manage your migration projects.

Our Oracle database is hosted in Amazon RDS, but the same migration considerations apply to on-premises sources. We’ll install two extractor agents running on Red Hat Linux on Amazon EC2 instances. We’ll only work through the configuration of one extractor; the process is the same for both. The servers are configured as EC2 t2.medium instances (two virtual CPU, 4 GB of memory, and 100 GB of local storage). The instance size values can be tuned for your particular data volume, latency, and performance requirements.

The extractors are controlled by an SCT client running on a Mac laptop. Because the client offloads data processing chores to the extractors, a small desktop or laptop is perfectly suited to run the client.

A great feature of the SCT client is that it doesn’t need to be online while the data extractors are running. The SCT client can reconnect to the extractors at any time by opening the specific migration project. One client can manage many extraction projects. The data extractors continue to operate even if the SCT client is unreachable for any reason.

(more…)

Amazon Aurora Under the Hood: Quorum Reads and Mutating State

Anurag Gupta runs a number of AWS database services, including Amazon Aurora, which he helped design. In this under the hood series, Anurag discusses the design considerations and technology underpinning Aurora.

In my last post, I talked about the benefits of using a quorum model. I discussed how such systems are resilient in the face of latency outliers, short periods of unavailability, and the long-term loss of disks and nodes. That raises the obvious question—if quorums are so awesome, why doesn’t everyone use them?

Reads become slow in quorum systems
One issue is that reads become slow in quorum systems. Quorum models require that there be at least one member in both the read quorum and write quorum. In a system like Amazon Aurora with a quorum of six, that means that you’d need to read three copies of data to ensure you overlapped with a write quorum of four.

That’s unfortunate. Usually when you’re reading a database page, it means that you’ve missed in the buffer cache, and your SQL statement is blocked waiting for the I/O before it can proceed. To read three copies of data, you want to try to access circa five, to mask outlier latency and intermittent availability issues. Doing that puts burden on the network—database pages are fairly large and the read amplification is meaningful. The performance of quorum reads doesn’t compare well to a traditional replication system, where data is written to all copies but can be read from any single one of them.

However, Aurora avoids quorum amplification during writes. We do write out six copies, but we only write log records, not full data pages. The data pages are assembled within the storage node from prior versions of the data page and the incoming log. We can also write asynchronously. Neither are possible for reads.

How to avoid read-quorum overhead
The overhead of read quorums is a clear disadvantage of quorum systems. How do we avoid it? The key thing is to use state.

State is often considered a dirty word in distributed systems—it is hard to manage and coordinate consistent state as you scale nodes and encounters faults. Of course, the entire purpose of database systems is to manage state, providing atomicity, consistency, isolation, and durability (ACID). Aurora sits at the intersection of these two technology domains. Much of our innovation has come from applying concepts from one domain to drive progress in the other.

Although it is difficult to establish consensus on distributed state without communication, there are often local oases of consistency that you can use to avoid the need for consensus, coordination, or locking. The specific example we apply here is that of read views. Many database systems have a similar concept, but let’s focus on MySQL.

MySQL, like all relational databases, needs to provide ACID support. A read view establishes a logical point in time, before which a SQL statement must see all changes that have been committed and not see any changes that are not yet committed. MySQL does this by establishing the log sequence number (LSN) of the most recent commit. This approach ensures that all changes already committed are to be made visible, and uses an active transactions list to establish the list of changes that should not be seen. When a statement with a particular read view inspects a data page, it needs to back out any changes for transactions that were active at the time it established a read view. This is so even if these changes are currently committed, and also affects any transactions that were started after the read-point commit LSN. When a transaction establishes a read view, it can isolate itself from all other changes going on in the system—if it can backtrack to a suitably consistent point in time.

What does this have to do with read quorums? Everything. The database is continually writing changes to storage nodes. It marks an individual change as durable once it receives four acknowledgements back. It updates the point of volume durable when all changes before that point have been registered as individually durable. As part of the bookkeeping it needs to perform to do this work, it knows which storage nodes have acknowledged which write requests and can be queried to see those changes. When a read request comes in, the request has a read-point commit LSN that the database needs to see. The database can simply dispatch the request to any storage node it knows to be complete at or beyond the read-point commit LSN.

This approach uses the bookkeeping state that we have to maintain anyway to avoid a quorum read. Instead, we read from a node that we know has the data version we need. This approach avoids considerable network, storage node, and database node processing.

(more…)

Building Distributed Locks with the DynamoDB Lock Client

by Alexander Patrikalakis and Sasha Slutsker | on | in DynamoDB | Permalink | Comments |  Share

Alexander Patrikalakis and Sasha Slutsker are senior software development engineers with Amazon.

TL;DR
At Amazon, teams all across the company build upon AWS services just like AWS’s external customers. Here we have a contribution from Alex and Sasha, who work on teams supporting Amazon.com about a new client library that they developed to make their applications better – hopefully you’ll find it useful too. The DynamoDB Lock Client is a Java Library widely used inside Amazon, which enables you to solve distributed computing problems like leader election and distributed locking with client-only code and a DynamoDB table.

DynamoDB supports mechanisms, like conditional writes, that are necessary for distributed locks. However, the AWS SDK doesn’t include the logic needed to actually implement distributed locks. The DynamoDB Lock Client wraps up the necessary client logic for distributed advisory locks in an easy-to-use client interface. The protocol in the lock client Java library is widely applicable, and we encourage you to apply it in other languages.

Background
Distributed locking can be a complicated challenge to solve, because you need to atomically ensure only one actor is modifying a stateful resource at any given time. For example, perhaps you have a database that serves as the central source of truth for your application. To ensure data is always accurate, you need to ensure that only one application server in your application server fleet is modifying a particular record in your database.

In AWS, the many moving parts of Amazon EC2 also need to agree on what their configuration should be so that they can survive many different failure modes. A primitive that enables this consensus must also be highly available, consistent, and partition-tolerant. Ideally, this primitive is unaffected by clock skew across large fleets of machines.

Systems like Raft and Paxos were designed to address these challenges, but they are notoriously difficult to implement and operate. What if you just want to easily implement a lock for a simple distributed application you’re writing? Amazon EC2 does just this with DynamoDB. We’re happy to present the DynamoDB Lock Client, a Java client library that uses locks to coordinate cluster configuration independently of system time, with baked-in DynamoDB primitives like consistent reads and conditional writes. We baked so you can have your cake too…

A practical example
Let’s suppose that you are a retail bank that wants to ensure that at most one customer service representative changes customer details at a time. From a fraud prevention perspective and from a consistency perspective, temporarily locking customer records during an update makes a lot of sense. In this case, each customer has a unique identifier. The bank stores customer information like account balances, transactions, addresses, contact information, and relationship history in many different tables. To make their system scale horizontally, this bank doesn’t embed foreign key relationships in any of their tables. Each table is isolated from all others. The bank uses their application layer to coordinate changes to each customer in a distributed fashion.

The tables are independent, so you can’t just wrap the changes you need in a relational transaction. Instead, you can lock the customer’s unique identifier at a high level. Alternatively, you can lock the unique identifiers of customer details (addresses and telephone numbers) at a finer-grained level. You’d do so with a locking API action for a certain duration in your application before making any changes.

The DynamoDB Lock Client implements a protocol allowing similar applications to take advisory locks on any part of your problem domain, big or small. This protocol ensures your players “stay in possession of the ball” for a certain period of time.

The locking protocol
For a new lock, the lock client stores a lock item in the lock table. With the item, it stores the host name of the owner, the lease duration in milliseconds, a UUID unique to the host, and the host system clock time when the lock was initially created. The lock table looks something like this in the AWS Management Console.

The sequence and architecture diagram following shows the locking protocol. The protocol architecture includes an EC2 Auto Scaling group. The group spans two subnets in two Availability Zones of a single AWS Region and two EC2 instances with a Java application running in each subnet. Each of the instances is trying to acquire a lock on Moe.

(more…)

Amazon Aurora under the hood: quorums and correlated failure

Anurag Gupta runs a number of AWS database services, including Amazon Aurora, which he helped design. In this Under the Hood series, Anurag discusses the design considerations and technology underpinning Aurora.

Amazon Aurora storage is a highly distributed system that needs to meet the stringent performance, availability, and durability requirements of a high-end relational database. This post is the first of a four-part series that covers some of the key elements of our design.

There isn’t a lot of publicly available material discussing tradeoffs in real-world durability, availability, and performance at scale. Although this series is based on the considerations involved in designing a transactional database, I believe it should be relevant to anyone architecting systems involving the coordination of mutable distributed state.

In this first post, I discuss how we arrived at the decision to use quorums for Aurora storage and why we distribute six copies of data across three Availability Zones (AZs). Some of this material is also discussed in our recent SIGMOD paper.

Why distributed storage is a good idea, but hard to do well
Let’s first discuss why distributed storage is a good idea. It’s easy to make a database run fast by collocating both the database software and the storage on a single box. The problem is that boxes fail. It takes time to recover from a backup after a failure. Many systems can’t tolerate losing recent data that hasn’t been backed up yet.

Beyond accounting for failures, separating the database instance from its storage improves flexibility. Customers shut databases down. They size them up and down. They add and remove read replicas. Decoupling the storage from the database makes these operations easy, since the underlying storage can just be detached and reattached rather than having to be recreated in a new location. Data has gravity in a way that compute does not.

Of course, moving storage away from compute just creates a dependency on still more devices that can independently fail. That’s why people use replication—either synchronous or asynchronous. If failures are independent, then replication improves durability.

But replication has its own problems. In synchronous replication, all copies must acknowledge before you consider a write to be durable. This approach puts you at the mercy of the slowest disk, node, or network path. Asynchronous replication improves latency, but can result in data loss if there’s a failure before data is replicated and made durable. Neither option is attractive. Failures require changes to the replica membership set. This approach is also awkward. Recreating a dropped replica is expensive, so people generally are conservative doing so. This conservatism means that you might see a few minutes of unavailability before the replica is fenced off.

A quorum model
Aurora instead uses a quorum model, where you read from and write to a subset of copies of data. Formally, a quorum system that employs V copies must obey two rules. First, the read set, Vr, and the write set, Vw, must overlap on at least one copy.

This approach means that if you have three copies of data, the read set and the write set can be two, ensuring each sees the other. This rule ensures that a data item is not read and written by two transactions concurrently. It also ensures that the read quorum contains at least one site with the newest version of the data item.

Second, you need to ensure that the quorum used for a write overlaps with prior write quorums, which is easily done by ensuring that Vw > V/2. This rule ensures that two write operations from two transactions cannot occur concurrently on the same data item. Here are some possible quorum models.

V (#copies) Vw (write quorum) Vr (read quorum)
1 1 1
2 2 1
3 2 2
4 3 2
5 3 3
6 4 3
7 4 4

Quorum systems have some nice properties. They can deal with the long-term failure of a node as easily as they deal with a transient failure (for example, due to a reboot) or slowness of one of the participants.

(more…)

Implementing DB Instance Stop and Start in Amazon RDS

by Marc Teichtahl | on | in RDS | Permalink | Comments |  Share

This post is from Matt Merriel at AWS partner Kloud, and Marc Teichtahl, manager for AWS Partner Solutions Architecture Australia and New Zealand. Kloud uses the new stop and start capabilities in Amazon RDS to lower costs for customers who don’t require 24×7 access to their databases during the testing and development phases of their projects.

Kloud provides professional and managed services to companies that are moving to the cloud. Amazon Relational Database Service (Amazon RDS) makes it easier for Kloud customers to go from project conception to deployment in a highly scalable, cost-effective, available, and durable fashion.

This post outlines deployment considerations, including security policies, filtering, and automation, to help you implement the DB instance stop and start capability in your environment.

Stopping and starting a DB instance—how it works

With the recently released DB instance stop and start feature in Amazon RDS, you can now stop and start database instances for up to seven days at a time. This makes it easier and more affordable to use Amazon RDS databases for development and test purposes when the database is not required to run all the time. This capability is similar to the feature available for Amazon EC2 instances.

To ensure ease of automation and management, the stop and start capability of Amazon RDS databases does not delete any previous backups, snapshots, or transaction logs. They simply remain in place when the database is stopped. As an added level of durability, the Amazon RDS service automatically backs up the DB instance while stopping it. This enables you to do point-in-time database restoration to any point within your configured automated backup retention window and decreases the time required for the initial backup after the DB instance is started. You aren’t charged while the database instance is stopping.

Starting an instance restores that instance to the configuration at the point in time when the instance was stopped. This includes endpoint configuration, DB parameter groups, option group membership, and security groups.

It’s possible to make changes to an option group that is associated with a stopped DB instance. If these changes are immediately applied, Amazon RDS applies the changes the next time the DB instance is started. Otherwise, Amazon RDS applies the changes during the next maintenance window after the stopped database instance has been started.

It’s important to understand that with all Amazon RDS databases, both persistent and permanent options can’t be removed from a DB instance option group if DB instances are associated with that option group. This functionality is also true of stopped instances. For example, calling the rds_remove_option_from_option_group command fails on a persistent option in an option group that is associated with a DB instance that is in the stopping, stopped, or starting state.

It’s also possible to make changes to a parameter group that is associated with a stopped database instance. If these changes are immediately applied, Amazon RDS applies the changes the next time the stopped database instance is started. Otherwise, Amazon RDS applies the changes during the next maintenance window after the stopped database instance is started.

An Amazon RDS DB instance may be in a stopped state for up to seven days at a time, after which it is automatically started. This is to ensure that required maintenance is applied to stopped instances. If the instance needs to be stopped again, you can do it using lifecycle hooks, Amazon CloudWatch, and AWS Lambda. All Amazon RDS database instance types support the stop and start capability. The feature is not currently supported for Amazon Aurora.

Deployment considerations

The primary use cases that Kloud implements for their customers focus on cost optimization and operational efficiencies within testing and development environments. As such, it’s not necessary to deploy configurations that use read replicas, require high availability via Multi-AZ deployments, or need SQL Server mirroring.

A database instance can still have a snapshot performed while in a stopped state. It’s also worth noting that, like stopping an Amazon EC2 instance, charges are still incurred for the storage and backups of the DB instance regardless of the state.

To ensure consistency and durability of the DB instance and its associated data, once an Amazon RDS DB instance has been stopped, it can’t be modified. This includes migrating a stopped instance into a Multi-AZ deployment, creating a read replica, or deleting a parameter or option group.

If an existing database is required to become Multi-AZ, or it requires a read-replica or deletion of a parameter or option group, you can create a second DB instance with the required capabilities by restoring a snapshot of the stopped database.

Security policies

AWS Identity and Access Management (IAM) is a web service that helps you manage users and user permissions in AWS along with providing granular management of access to AWS resources.

As with all AWS services, a policy configuration is required to enforce appropriate access to the Amazon RDS stop and start capability. You can do this by allowing the rds:startDBInstance  and rds:stopDBInstance actions with the appropriate IAM policy. The following example shows a simple example of such a policy:

{
    "STATEMENT": [{
            "EFFECT": "ALLOW",
            "ACTION": [
                "RDS:DESCRIBEDBINSTANCES",
                "RDS:STARTDBINSTANCE",
                "RDS:STOPDBINSTANCE"
            ]
            "RESOURCE": [
                "*"		

(more…)

Strategy for Migrating Partitioned Tables from Oracle to Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL Compatibility

Sandeep Kariro is a big data architect and Akm Raziul is a database architect at Amazon Web Services.

Enterprises have been using closed-source database systems for more than four decades. Managing data with these systems has been a critical part of running a successful business in every organization. You have to make a considerable investment up front to obtain the required hardware and enterprise licenses before you can even start building the system. With cloud computing gaining momentum in recent years and technology maturing in the open source system world, interest has been growing in moving critical traditional online transaction processing (OLTP) database workloads to open-source systems such as PostgreSQL, MySQL, and others.

Migrating one database system to another requires significant effort and planning. Even though the basics around the database system generally stay the same, implementation and operational activity on each of them can vary quite a bit.

The AWS Schema Conversion Tool (AWS SCT) helps migrate Oracle schemas and custom stored procedure code to a PostgreSQL-compatible target with minimal effort. AWS SCT generates a migration assessment report that clearly identifies objects and code that require manual conversion or rework. One critical need for high volume OLTP applications is to have the data partitioned for performance reasons. During the initial assessment, AWS SCT highlights the Oracle partitions that require manual intervention. This is due to the way current versions of PostgreSQL (as of 9.6.2) handle table partitions. PostgreSQL manages partitioned tables differently from Oracle, SQL Server, or MySQL database partitions.

In this post, we discuss how to convert Oracle Database native range and list partitions to PostgreSQL native range and list partitions.

Range and list partition definitions in Oracle
Range: Each partition holds a range of values based on the partition key. For example, you can partition data based on a date field. Each partition can hold data for a date range such as a week, a month, a quarter, etc. The first partition of the table is always open-ended toward lower values. You can optionally set the last partition to be open-ended also by defining the maximum partition value as MAXVALUE. A range partition can have up to 16 columns.

The following is an example of a table created with a range partition on create date. The table is created with three partitions. The first partition holds all data created with a create date earlier than 31-DEC-2016. The second partition holds all data created with a create date between 01-JAN-2017 and 31-DEC-2017. The third partition holds all other data created with a create date after 31-DEC-2017.

CREATE TABLE person 
	(id NUMBER(10) NOT NULL
	,name VARCHAR2(50)
	,gender VARCHAR2(1)
	,create_dt DATE DEFAULT sysdate
	)
PARTITION BY RANGE(create_dt) (
  partition p1 values less than ('01-JAN-2017'), 
  partition p2 values less than ('01-JAN-2018'), 
  partition p3 values less than (MAXVALUE)
);

(more…)

Level Up Your Games with Amazon Aurora

Dhruv Thukral is a solutions architect at Amazon Web Services.

Amazon Aurora is rapidly becoming the relational database of choice for some of the largest and fastest-growing gaming companies in the world. Companies like Zynga and Double-Down Interactive in the Americas, and Grani and Gumi in Asia Pacific, use Amazon Aurora. Aurora gives them the speed and availability of high-end commercial databases, with the simplicity and cost structure of open-source databases.

This potent combination of speed and cost has even allowed companies like Double-Down Interactive to run Aurora at a cheaper cost than MySQL. The higher performance of Aurora means that you can often use fewer or smaller instances than with MySQL. If you’re running a sharded MySQL environment, which we see many gaming companies do, you can potentially save money by switching to Aurora. At the same time, you can also get two to three times better performance, as you can see later in this blog post. Other potential benefits are higher availability, lower replica lag, and less sharding because you can consolidate existing shards.

In this post, we discuss how Amazon Aurora can help you build a reliable and scalable database tier using a sample reference architecture for a mobile game. We also highlight experiences from gaming customers who have already migrated their existing databases to Amazon Aurora.

The value of choosing the right database

When architecting an application for scale, one key factor is the database tier. The database tier is especially important for games, which are extremely read- and write-heavy. Game data is continuously updated and read as the player progresses through levels, defeats enemies, receives in-game currency, changes inventory, unlocks upgrades, and completes achievements. Each event must be written to your database layer so it isn’t lost. Losing this progress within the game can lead to negatively trending Twitter posts and being paged in the middle of the night.

Developers of games and web apps often use an open-source relational database such as MySQL for their database layer because it is familiar. The querying flexibility and the ACID properties for transactional aspects of the game logic are very appealing for many developers. The traditional way of scaling this tier, especially when using MySQL, is by sharding. In some cases, scaling is also done by splitting the reads off to a read replica. Although this approach lets developers keep adding shards, maintaining such an infrastructure comes with overhead. For example, what if you run out of storage in a shard, or what happens if your read replica goes down? Or what happens if you lose an entire shard and the data with it when it’s unexpected?

Why do game developers feel Aurora is the right choice for them?

  1. Drop-in MySQL compatibility: MySQL compatibility allows game developers to integrate with Aurora and get all its benefits without changing their applications. They just need to change the endpoint in their database configuration within the application. They can keep using the same code and tools that they are used to today.
  2. Not having to make decisions on how much storage to allocate to the database: Deciding how much storage to allocate to your database, especially for production workloads is one decision that game developers (or developers and DBAs in general) don’t want to make. Allocate too little, and you end up running out of space when you need it the most, requiring you to put your game in maintenance mode while you upgrade. Allocate too much, and you are wasting money. With Aurora, data is stored in a cluster volume, which is a single, virtual volume that utilizes solid state disk (SSD) drives. A cluster volume consists of copies of the data across multiple Availability Zones in a single region. Aurora cluster volumes automatically grow as the amount of data in your database increases. An Aurora cluster volume can grow to a maximum size of 64 terabytes (TB). Table size is limited to the size of the cluster volume. That is, the maximum table size for a table in an Aurora DB cluster is 64 TB. Even though an Aurora cluster volume can grow to up to 64 TB, you are only charged for the space that you use in an Aurora cluster volume. You can start with a volume as small as 10 GB.
  3. Unique Read Replicas: Aurora’s Read Replicas use the same underlying cluster volume as that of the primary. This approach allows for a reduced replication lag even when the replicas are launched in different Availability Zones. Aurora supports up to 15 replicas with minimal impact on the performance of write operations. In contrast, MySQL supports up to 5 replicas, which can impact the performance of write operations and exhibit some replication lag. Additionally, Aurora automatically uses its replicas as failover targets with no data loss. Because these replicas share the same underlying storage as the primary, they lag behind the primary by only tens of milliseconds, making the performance nearly synchronous.
  4. Amazon manages your database for you: Game developers are focused on building great games, not on dealing with maintenance and operations. Amazon Aurora is a managed service that includes operational support and high availability across multiple data centers. You don’t have to worry about installing software or dealing with hardware failures.
  5. Performance boost: Gaming customers have reported up to two to three times the performance by moving to Amazon Aurora when compared to existing open-source databases. Grani, a top Japanese social gaming publisher, moved their web-based games platform from MySQL to Aurora and shared the following results. The first graph shows their average web transaction response time in their legacy system, broken down by the different layers in their overall stack.

    In the preceding setup, Grani was running MySQL on an r3.4xl instance type in Amazon RDS with Multi-AZ enabled. Their total database response times were in the range of 15–22 ms., with a total response time of about 50 ms. The graph following shows their results after moving to Amazon Aurora.

    They migrated to a similar r3.4xl node with one master and one Read Replica. They were able to get their overall response time down to 5.5 ms. Following is another graph that provides a full picture of the overall response times for their platform database before and after migration.

(more…)

Understanding Burst vs. Baseline Performance with Amazon RDS and GP2

Phil Intihar is a database engineer at Amazon Web Services.

When we think about database storage, the dimensions that matter are the size, latency, throughput, and IOPS of the volume. IOPS stands for input/output (operations) per second, and latency is a measure of the time it takes for a single I/O request to complete. As you can imagine, latency and IOPS are closely related and are key indicators of database performance. This post focuses on understanding how to work with Amazon RDS storage and how it relates to IOPS.

Amazon RDS volume types

Amazon RDS volumes are built using Amazon EBS volumes, except for Amazon Aurora, which uses an SSD-backed virtualized storage layer purpose-built for database workloads. RDS currently supports both magnetic and SSD-based storage volume types. However, magnetic volumes are slower and don’t perform consistently, so they are not recommended for performance-focused workloads. So if you’re reading this to learn how to get better performance out of your RDS database, you should avoid magnetic storage.

There are two supported Amazon EBS SSD-based storage types, Provisioned IOPS (called io1) and General Purpose (called gp2). With io1, it’s quite simple to predict IOPS because this is the value you provide when the volume is created. The Amazon EBS documentation states that io1 volumes deliver within 10 percent of the Provisioned IOPS performance 99.9 percent of the time over a given year. In other words, you can expect consistent performance with io1.

The gp2 storage type also has a base IOPS that is set when the volume is created. However, you don’t provide a value for the IOPS directly—instead, IOPS is a function of the size of the volume. The IOPS for a gp2 volume is the size of the volume in GiB x 3, with a minimum of 100 IOPS and a maximum of 10K IOPS. The gp2 volumes have a characteristic called burst mode that is often misunderstood. Let’s delve into the performance characteristics of gp2 and understand burst versus baseline performance.

Comparing performance: burst vs. baseline examples

To understand burst mode, you must be aware that every gp2 volume regardless of size starts with 5.4 million I/O credits at 3000 IOPS. This means that even for very small volumes, you start with a high-performing volume. This is ideal for “bursty” workloads, such as daily reporting and recurring extract, transform, and load (ETL) jobs. It is also good for workloads that don’t require high-sustained IOPS.

How does this work? Well, as stated earlier, the gp2 volumes start with I/O credit that, if fully used, works out to 3000 IOPS for 30 minutes. The burst credit is always being replenished at the rate of 3 IOPS per GiB per second. Consider a daily ETL workload that uses a lot of I/O. For the daily job, gp2 can burst, and during downtime, burst credit can be replenished for the next day’s run. Now let’s consider a workload that never consumes more IOPS than the burst. Such a workload will continue to see very good IOPS as long as credits are replenished faster than they are consumed.

In the following example, I created an Amazon RDS instance with a 20 GiB gp2 volume. Such a volume bursts to 3000 IOPS. But once the burst is exhausted, it delivers only 100 IOPS, since 100 is the minimum IOPS. The point to stress here is that the small volume performs very well for this simulated nightly job. And then over the course of the next 12 hours, the burst credits accumulate in time for the next day’s nightly job.

(more…)

Accelerate Alfresco Content Services Deployment on Amazon Aurora with AWS Quick Start

Linda Wu is a product manager at Amazon Web Services.

When we built Amazon Aurora a few years ago, we had the following tenets in mind:

  • Combine the speed and availability of a high-end commercial database with the simplicity and cost-effectiveness of open source.
  • Deliver full compatibility with the popular open-source database MySQL so that customers don’t need to change their existing applications.
  • Offer a managed service so that customers can focus on developing applications rather than worry about keeping their databases up, and only pay for what they use.
  • Provide a cloud-native database that meets the scalability needs of modern applications.

Today, thousands of customers and partners have adopted Amazon Aurora (MySQL Compatible). We see customers and ISV partners coming to Amazon Aurora from various sources. Some are moving from on-premises databases to Amazon Aurora; others are migrating from their MySQL or commercial databases on Amazon EC2 to Amazon Aurora.

Why Amazon Aurora?
Alfresco Software, an enterprise open-source content and process services company, was one of the early adopters of Amazon Aurora. The company wanted to provide highly scalable content management services in the cloud. It turned to AWS services, including Amazon Aurora, to realize its goal.

In a typical customer deployment, shown in the following diagram, Amazon Aurora provides a high-performance database to store metadata and metadata hierarchies for the content that’s managed on the Alfresco platform. The metadata helps people find the right content quickly and easily. It’s also used to move documents through a business process, control content access, and automate records management. In a sense, Amazon Aurora powers “the brains” of Alfresco Content Services.

 

(more…)

Use Amazon S3 to Store a Single Amazon Elasticsearch Service Index

Jon Handler (@_searchgeek) is an AWS solutions architect specializing in search technologies.

As detailed in our documentation, you can use the Elasticsearch API actions in Amazon Elasticsearch Service to take manual snapshots of your domain. You can easily back up your entire domain this way. However, did you know you can also snapshot and restore a single index, or multiple indexes? This blog post walks you through backing up and restoring a single index by using an Amazon S3 bucket.

Note: This blog post uses an Amazon Elasticsearch Service (Amazon ES) version 5.3 domain.

If you’re running a log analytics workload, use this technique to move older indices off of your cluster, retaining them in S3 for future use. You’ll save on cost but still be able to retrieve and explore the data. You can also use this technique to migrate an index from one Amazon ES domain to another for version upgrades. You can also copy the index to another AWS Region and deploy it there for a cross-region copy.

Set up Amazon S3 and AWS Identity and Access Management (IAM)
The first thing you need to do is create a bucket in Amazon S3. I named my bucket es-s3-repository.

(more…)