Category: RDS PostgreSQL


Picking Your First Database to Migrate to Amazon RDS or Amazon Aurora with PostgreSQL-compatibility

David Rader is the vice president of engineering at OpenSCG.

OpenSCG is an AWS Consulting Partner and a leading expert in PostgreSQL, helping customers migrate to, operate, and optimize PostgreSQL and other data platforms on-premises and in the cloud.

In a previous post, we covered the overall approach you can take to migrate your current database to PostgreSQL. When you want to nudge your organization to start migrating from Oracle (or Sybase or Microsoft SQL Server) to PostgreSQL or a PostgreSQL-compatible engine on AWS, you need to choose a candidate for your “first migration project.”

This post helps guide you, as you look across your portfolio of databases (and applications), to think about the various factors that make migrations easy or difficult. Following this guidance can help ensure that your first migration delivers enough benefit to enable your organization to take on a larger migration effort.

Choosing your first database

To determine which database to migrate first, you should evaluate several technical and non-technical dimensions, including the following:

  • Risk vs. reward
  • Database size and complexity
  • Applications and integration
  • Organizational readiness
  • Business acceptance

These factors determine the level of effort, time, and organizational willpower that is required to complete a successful migration.

(TL;DR) For the impatient among you, the best case is to pick a database that has the following conditions: the database is significant but not mission critical, you have good test plans and test cases, your team knows the code and business logic, the development team is nimble, and the business owner is a good partner. The further you move off this ideal, the more likely it is that your first migration will become overly difficult, won’t impress the organization enough to drive a change, or will outright fail.

If you work in a smaller SaaS provider that has a single big database that is your product, you may only have one database to choose from. But you can still understand what makes the migration process easier. And if you must demonstrate the feasibility of moving one of your enterprise’s largest and most complex databases, you might need to do a “proof of concept” before your first migration for production use.

Let’s look at each of these dimensions and how they impact the choice of a first database to migrate to PostgreSQL.

(more…)

Strategy for Migrating Partitioned Tables from Oracle to Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL Compatibility

Sandeep Kariro is a big data architect and Akm Raziul is a database architect at Amazon Web Services.

Enterprises have been using closed-source database systems for more than four decades. Managing data with these systems has been a critical part of running a successful business in every organization. You have to make a considerable investment up front to obtain the required hardware and enterprise licenses before you can even start building the system. With cloud computing gaining momentum in recent years and technology maturing in the open source system world, interest has been growing in moving critical traditional online transaction processing (OLTP) database workloads to open-source systems such as PostgreSQL, MySQL, and others.

Migrating one database system to another requires significant effort and planning. Even though the basics around the database system generally stay the same, implementation and operational activity on each of them can vary quite a bit.

The AWS Schema Conversion Tool (AWS SCT) helps migrate Oracle schemas and custom stored procedure code to a PostgreSQL-compatible target with minimal effort. AWS SCT generates a migration assessment report that clearly identifies objects and code that require manual conversion or rework. One critical need for high volume OLTP applications is to have the data partitioned for performance reasons. During the initial assessment, AWS SCT highlights the Oracle partitions that require manual intervention. This is due to the way current versions of PostgreSQL (as of 9.6.2) handle table partitions. PostgreSQL manages partitioned tables differently from Oracle, SQL Server, or MySQL database partitions.

In this post, we discuss how to convert Oracle Database native range and list partitions to PostgreSQL native range and list partitions.

Range and list partition definitions in Oracle
Range: Each partition holds a range of values based on the partition key. For example, you can partition data based on a date field. Each partition can hold data for a date range such as a week, a month, a quarter, etc. The first partition of the table is always open-ended toward lower values. You can optionally set the last partition to be open-ended also by defining the maximum partition value as MAXVALUE. A range partition can have up to 16 columns.

The following is an example of a table created with a range partition on create date. The table is created with three partitions. The first partition holds all data created with a create date earlier than 31-DEC-2016. The second partition holds all data created with a create date between 01-JAN-2017 and 31-DEC-2017. The third partition holds all other data created with a create date after 31-DEC-2017.

CREATE TABLE person 
	(id NUMBER(10) NOT NULL
	,name VARCHAR2(50)
	,gender VARCHAR2(1)
	,create_dt DATE DEFAULT sysdate
	)
PARTITION BY RANGE(create_dt) (
  partition p1 values less than ('01-JAN-2017'), 
  partition p2 values less than ('01-JAN-2018'), 
  partition p3 values less than (MAXVALUE)
);

(more…)

Using the AWS Database Migration Service, Amazon S3, and AWS Lambda for Database Analytics

Jeff Levine is a solutions architect for Amazon Web Services.

The AWS Database Migration Service (AWS DMS) supports Amazon S3 as a migration target. The services enable you to extract information from any database supported by DMS and write it to Amazon S3 in a format that can be used by almost any application. You can extract the entire database and replicate ongoing changes including additions, deletions, and updates using change data capture (CDC) technology. You can even process the changes with AWS Lambda or Amazon Kinesis Firehose. By using Lambda or Firehose, you can extend the capabilities of AWS services to existing database environments, both those within AWS and in other locations.

Overview of the example

Let’s consider an example that shows how to bring these services together. We use the AWS Database Migration service to migrate Twitter statistics produced by the City of Seattle, Washington from Amazon RDS for PostgreSQL to Amazon S3 and use AWS Lambda for analysis.  We see how DMS handles both the migration of the initial database contents and ongoing change data capture. Here’s a diagram of what we plan to do.

We will do the following:

  1. Create two datasets containing Twitter account statistics. One initializes a database table. The other one performs subsequent updates to that table, processed by the change data capture feature.
  2. Create an Amazon RDS for PostgreSQL database instance, define a table, and load the first dataset.
  3. Create an Amazon S3 bucket to serve as the target.
  4. Create an AWS Lambda function to analyze S3 objects upon creation.
  5. Configure and invoke the AWS Database Migration Service to migrate the initial dataset that we just loaded into RDS to the S3 bucket. Doing this demonstrates the initial load phase of DMS and shows AWS Lambda processing the dataset.
  6. Copy the second dataset into the database. Doing this demonstrates the change data capture feature of DMS and shows AWS Lambda processing the dataset.
  7. Examine the contents of the Amazon S3 bucket to see the results of both the initial migration and the change data capture.
  8. View the results of the analysis in Amazon CloudWatch.

Prerequisites and assumptions

You will need the following:

  • An AWS account that provides access to the services shown in the diagram.
  • Working knowledge of these services.
  • A utility that connects to a PostgreSQL database, such as psql.

Additionally:

  • We will configure all services in the same VPC and region to simplify networking considerations.
  • The VPC must have an S3 endpoint.

(more…)

Migrate Your Procedural SQL Code with the AWS Schema Conversion Tool

Database administrators and developers rely on relational databases to store data for applications. As Forbes noted in 2016, the development of open source offerings for database management systems like PostgreSQL is causing a growing number of enterprises to migrate to lower-cost solutions for their data storage. The move to the cloud often provides an excellent opportunity to make such a change.

If your system contains a lot of business logic in the database, it can be quite a large task to translate between SQL dialects line by line. Fortunately, Amazon Web Services offers the AWS Schema Conversion Tool (AWS SCT), a standalone application with a project-based user interface that does the heavy lifting of translating SQL dialects for you.

Using AWS SCT
This post demonstrates how easy it is to migrate procedural code from Oracle to PostgreSQL using AWS SCT. It includes how to spot the differences between the two dialects and how to modify the code when needed. You can use AWS SCT to generate a high-level overview of the effort required to convert an Oracle database to Amazon RDS for PostgreSQL. In the example, the tool is running on Windows, but it can run on a number of different operating systems. To download AWS SCT, see Installing and Updating the AWS Schema Conversion Tool. For general information about the tool, start with What Is the AWS Schema Conversion Tool?

The example uses an Oracle database on Amazon EC2 and the well-known SCOTT/TIGER database as a starting point. Three stored procedures are added to demonstrate how AWS SCT handles common dialect translations. For information about installing the base SCOTT schema objects, see the Oracle FAQ wiki. You can download additional Oracle procedures and the translated PostgreSQL examples from this repository.

Prerequisites
The example in this post requires the following:

  • A source Oracle database with SCOTT schema objects installed
  • Three additional stored procedures:
    Procedure Source file
    emp_by_job oracle-emp-by-job.sql
    get_emp_info oracle-get-emp-info.sql
    employee_report oracle-employee-report.sql
  • Target Amazon RDS for PostgreSQL database
  • AWS SCT installed

Generate an assessment report
Typically, the first step in a migration is to understand its feasibility and the effort it will require. You can use AWS SCT to generate a high-level overview of the work ahead. In this example, open the context (right-click) menu for the SCOTT schema, and choose Create Report. The assessment report provides a high-level summary of how much of the schema can be converted automatically from Oracle to PostgreSQL, and the work that will be left after conversion.

The following screenshot shows the results of the assessment.

(more…)

Migrating to PostgreSQL Lifecycle

This is a guest post by David Rader, the vice president of product development at OpenSCG.

OpenSCG is an AWS Consulting Partner and a leading expert in PostgreSQL, helping customers migrate to, operate, and optimize PostgreSQL and other data platforms on-premises and in the cloud.

There is a lot of interest these days in migrating data from relational databases to the open source PostgreSQL. PostgreSQL is a great choice for migration, offering advanced features, high performance, rock-solid data integrity, and a flexible open source license. PostgreSQL is ANSI SQL compliant, supports drivers for nearly all development languages, and has a strong community of active committers and companies to provide support.

This post describes the overall approach for migrating an application from your current database to PostgreSQL.

Choosing a database or application

When you’re deciding which of your applications and databases to start with, you should consider several factors. If this is your company’s first PostgreSQL migration, you want to start with a less challenging, lower-risk migration to get your feet wet. Just as with any technology switch, don’t choose your mission-critical, 24/7, high-volume, main revenue generator as the first application you try to run in production on a technology you’re just learning. PostgreSQL is certainly capable of running that important application, but you should build up some experience with the technology before making a “bet-the-company” migration. If, for some reason, you do have to jump into the deep end right away, make sure that you hire experienced database administrators (DBAs) who can help you do it right the first time.

For database migrations, you need to think about the applications that use the database. In general, an in-house application or website that you control the source code for is easier to modify for a new database. A third-party commercial software product that doesn’t provide support for PostgreSQL is usually a non-starter.

Departmental applications or reporting data marts are lower-risk choices that give you an opportunity to introduce the new technology to your organization. Dashboards or extract, transfer, load (ETL) tools that work against multiple databases through ODBC or JDBC connections are great candidates because of the very high compatibility and low modification requirements.

Migration assessment

As a first step in planning your migration, analyze the application and estimate how easy or difficult it will be to migrate it from your existing database to PostgreSQL. This “Migration Assessment” stage should cover several areas, including technology, personnel, and processes.

Technology issues

The technical aspect is the easiest to evaluate. The level of effort needed to migrate the data and database code is determined by the size and complexity of the current database, the type of data access layer that is used, how many database objects there are (functions, triggers, and stored procedures), and whether any database-specific functionality is used.

You should analyze the application code in addition to the database to understand what changes might be required in the application. For example, an application that has a lot of dynamic SQL scattered throughout the code requires more effort to convert than an application with a clean data access layer.

(more…)

Use SQL to map users, roles, and grants from Oracle to PostgreSQL

Many customers are undertaking the journey to open source relational database management system (RDBMS) databases. With the latest release of the AWS Schema Conversion Tool (AWS SCT), it’s easy to change your database engine. AWS SCT makes heterogeneous database migrations easier by automatically converting procedural code from the source SQL dialect into the target SQL dialect. Any code that the tool can’t convert automatically (due to incompatible constructs between SQL dialects) is clearly marked so that you can convert it yourself, often with suggestions.

A few weeks ago, we featured a blog post Introducing AWS Schema Conversion Tool Version 1.0.502 by Eran Schitzer that can help you get started. You can also get started with SCT by following the steps in the AWS documentation to create a project, connect to your source, and run an assessment report.

For a complete end-to-end migration, you need to create and prepare a few items in your target database. Items like users, roles, and grants that have to be re-created or reproduced on the target can be a challenge if you don’t use a repository to store that information. This post outlines a method for extracting users, grants, and roles from an Amazon EC2 Oracle instance and applying them to an Amazon RDS for PostgreSQL instance. You can download the code used in this blog post from this repository.

Create a target database

In preparing for migration, you first create a target RDS PostgreSQL database and establish security groups for connectivity.

Create an Amazon RDS target
Create a target RDS PostgreSQL database in your VPC. You can find a complete set of instructions in the Amazon RDS User Guide. Be sure to attach security groups with adequate database ingress and egress rules for your client machine to the new RDS instance. My EC2 Oracle instance is on the default port 1521, and my RDS PostgreSQL instance is on the default port 5432. I’m using the same security group for both instances. I restricted traffic inbound to those ports from my client but allowed all local traffic within the security group itself.

(more…)

Wave: A Private Location App Running on Amazon RDS

By Pablo Clemente, CTO, Wave

How many times have you endured the painful process of meeting up with someone and having to constantly text or call them to find out where exactly they are? Doing this is not only inefficient and frustrating, but also a potential danger for anyone who is driving or moving. Recently, several smartphone apps have appeared that are designed to locate your contacts, such as Facebook’s Find My Friends and Google’s Latitude. The Wave app locates your contacts and improves on earlier solutions in a few ways.

 

(more…)

Performing Parallel Queries and Phrase Searching with Amazon RDS for PostgreSQL 9.6.1

Jeff Levine is a solutions architect for Amazon Web Services.

As a solutions architect for Amazon Web Services, one of my responsibilities is to help our customers to learn about new features and capabilities of our services. Amazon RDS makes it possible to set up, operate, and scale relational databases in the cloud without having to focus on time-consuming administrative tasks.   Amazon RDS recently announced the support of PostgreSQL 9.6.1. In this blog post, I will discuss two of the features included in this release: the ability to perform parallel queries and phrase searches.

Parallel Queries
Parallel queries make it possible to use multiple CPU cores to speed up the execution of queries by creating multiple worker processes. Each worker process performs the query on a subset of data. The results are then combined to produce the final query result.  The number of worker processes is determined by the value of the parameter max_parallel_workers_per_gather. The default value of this parameter is 0, which disables parallelism.

In the examples that follow, I’ve created a VPC with a PostgreSQL RDS database on a db.m4.large instance class with 2 vCPUs and 8 GB RAM. I’m going to use pgcli, a command line tool for PostgreSQL, to interact with the database named pg961db to create a table.

CLi

(more…)

Implement an Early Warning System for Transaction ID Wraparound in Amazon RDS for PostgreSQL

Shawn McCoy is a database engineer in Amazon Web Services.

As a PostgreSQL database owner, you know that critical maintenance operations are handled by the autovacuum process. The effectiveness of this background process is constantly being improved with each release. For most PostgreSQL users, the default settings will work just fine. However, some workloads and usage patterns require customized parameters. I’ll cover two areas in this post: First, what you can do to monitor your Amazon RDS for PostgreSQL database’s transaction ID health, and second, common things I’ve helped customers work through once a problem is identified.

What is “transaction ID” health?
A PostgreSQL database can have two billion “in-flight” unvacuumed transactions before PostgreSQL takes dramatic action to avoid data loss. If the number of unvacuumed transactions reaches (2^31 - 10,000,000), the log starts warning that vacuuming is needed. If the number of unvacuumed transactions reaches (2^31 - 1,000,000), PostgreSQL sets the database to read-only mode and requires an offline, single-user, standalone vacuum. This vacuum requires multiple hours or days of downtime (depending on database size). A very detailed explanation of transaction ID wraparound is found in the PostgreSQL documentation.

Early warning is the key
We now know that around 2.1 billion unvacuumed transactions is “where the world ends” to quote the source code comment. There are several reasons a transaction can become this old, but let’s first look at how you can be alerted of this pending problem. Although it can be monitored with manual queries, we have introduced an Amazon CloudWatch metric, MaximumUsedTransactionIDs, to make setting this kind of alert easier for you. To generate this metric, the Amazon RDS agent runs this query:

SELECT max(age(datfrozenxid)) FROM pg_database;

From my experience working with customers, I recommend an alarm when this metric reaches a value of 1 billion. For some situations, a low severity warning at 500 million might also be useful. Let’s take a look at configuring a CloudWatch alarm, as described in the CloudWatch documentation.

First, choose CloudWatch in AWS Management Console:

AWSConsole

(more…)

Migrate PostgreSQL Databases and Perform Ongoing Replication with the AWS Database Migration Service

Jeff Levine is a solutions architect for Amazon Web Services.

As a solutions architect for Amazon Web Services, I help our customers to migrate workloads to the AWS Cloud, a key part of which includes their databases. The AWS Database Migration Service (AWS DMS) enables organizations to migrate data to and from a variety of databases located in Amazon Relational Database Service (Amazon RDS), those running on Amazon EC2, and also those running on-premises. The service supports homogeneous migrations such as Oracle to Oracle, and also heterogeneous migrations such as Oracle to Amazon Aurora. You can also customize table mappings and perform translations. In this post, I am going to talk about two things, first about using an RDS PostgreSQL source database for DMS and then about using the recently announced continuous data replication feature, which can keep the target database up to date after the initial migration.

Using an RDS PostgreSQL Source
You might think that migration means just moving data from one location to another, but the process involves far more. The process must also keep up with transactions that take place during the migration, so that the database is left in a consistent state upon completion. Accordingly, when we look at migrating from an Amazon RDS PostgreSQL source database, we need to take into account both how data is moved and also environmental factors that can affect the ongoing processing of transactions while the migration is happening.

Let’s take the case of a PostgreSQL to MySQL migration. I’ll begin with a simple table definition.

CREATE TABLE subscribers (
subscriber_id SERIAL NOT NULL PRIMARY KEY, subscriber_info json NOT NULL
);

I’m going to use pgAdmin, a popular open source administration tool for PostgreSQL, to create the table and display the specification in the source database.

After I run the DMS migration task, here’s how the newly created subscribers table appears in the target database using the program DbVisualizer.

Notice how the serial column is now an integer and the subscriber_info column has been converted to longtext. To understand what happened, we need to consider how DMS operates. DMS functions as an intermediate broker to a variety of different database engines. To make this conversion happen, DMS first maps the source data type (JSON in this case) to an intermediate DMS type and then maps that type to the target data type (longtext). It’s very important to take into account these mappings when designing your migration strategy. Note that I used the default mapping. DMS also allows you to customize the mappings if you want.

We also need to look at source-related performance factors such as replication activity and the use of write-ahead logging (WAL) that can affect the migration. Using Amazon RDS, you can change database parameters using parameter groups. One adjustment we suggest is changing the wal_sender_timeout parameter to 0. To do this, create a parameter group for your RDS PostgreSQL database and adjust the parameter accordingly. Because this is a dynamic parameter, the adjustment will take effect immediately. You can read about all the nuances of using PostgreSQL DMS sources in the RDS documentation.

Using Continuous Data Replication
Additionally, AWS recently announced continuous data replication for DMS, which enables you to keep your database up to date after the initial migration. With this feature, you can now have high availability during the replication process by specifying Multi-AZ when you create your replication instances.

When Multi-AZ is selected, two replication hosts will be created, each of which has its own IP address. Remember to adjust the underlying source and target security groups (or their on-premises equivalents) to grant both instances the appropriate level of access.

The AWS Database Migration Service offers a powerful set of migration capabilities. It is important to consider how the service works behind the scenes as part of the development of an overall migration plan. With continuous data replication, you can now keep your databases up to date after the initial load.  It’s easy to get started. Follow our Getting Started Guide to migrate your database using AWS Database Migration Service with just a few clicks.