Category: Schema Conversion Tool (SCT)


Migrating a SQL Server Database to a MySQL-Compatible Database Engine

This post provides an overview of how you can migrate your Microsoft SQL Server database to a MySQL-compatible database engine such as Amazon RDS for MySQL, Amazon RDS for MariaDB, or Amazon Aurora MySQL.

The following are the two main parts of a database migration:

  1. Schema conversion: Converting the schema objects is usually the most time-consuming operation in a heterogeneous database migration. It is the foundation of the database and must be handled in a well-planned manner. If the schema conversion is done appropriately, a major milestone for the heterogeneous migration is complete.
  2. Data migration: The base data elements are like the building blocks for the schema foundation mentioned previously. If the foundation is laid properly, arranging these blocks is comparatively simpler during the migration when best practices are followed.

In this post, we go over how to use the AWS Schema Conversion Tool (AWS SCT) and the AWS Database Migration Service (AWS DMS) to migrate a SQL Server database to popular MySQL-based database engines like Aurora MySQL, MySQL, and MariaDB. In this post, we refer to all these database engines as MySQL.

When it comes to migrating a SQL Server database to a MySQL-compatible database, the database object semantics between SQL Server and MySQL are similar. However, there are major architectural differences that must be considered when you’re migrating from SQL Server to MySQL. For example, in MySQL, “database” and “schema” both share the same meaning. There are no functional differences between a database and a schema. The following example illustrates this point:

mysql> create schema test01;
Query OK, 1 row affected (0.01 sec)

mysql> create database test01;
ERROR 1007 (HY000): Can't create database 'test01'; database exists

As the example shows, “database” and “schema” are synonymous and don’t have any separate meaning. When it comes to referring to a table within a database, the complete table identifier in MySQL looks like databasename.tablename.

On the other hand, SQL Server does convey a different meaning and functionality to the “database” and “schema” keywords. In SQL Server, a database is the main container that holds all objects, data, and log files. A schema is an object within the specific database that logically groups the other database objects together. SQL Server uses the schema name dbo by default. However, this can be changed to meet organizational, functional, or business needs. The complete table identifier in SQL Server is Databasename.Schemaname.tablename.

(more…)

Using Amazon Redshift for Fast Analytical Reports

With digital data growing at an incomprehensible rate, enterprises are finding it difficult to ingest, store, and analyze the data quickly while keeping costs low. Traditional data warehouse systems need constant upgrades in terms of compute and storage to meet these challenges.

In this post, we provide insights into how AWS Premier Partner Wipro helped their customer (a leading US business services company) move their workload from on-premise data warehouse to Amazon Redshift. This move enabled the company to reap the benefits of scalability and performance without affecting how their end users consume reports.

Current data warehouse environment and challenges faced
The customer was running commercial enterprise data warehouse that contained aggregated data from different internal reporting systems across geographies. Their primary goal was to provide quick and accurate analytics to drive faster business decisions. The user base was distributed globally. Meeting this goal was difficult due to the following challenges:

  • The data warehouse (5 TB) was growing at over 20 percent year over year (YoY), with higher growth expected in the future. This growth required them to keep upgrading the hardware to meet the storage and compute needs, which was expensive.
  • The Analytical Dashboard experienced performance issues because of the growing data and user base.
  • Licensing was based on CPU cores, so adding hardware to support the growth also required additional investment in licenses, further spiraling the costs.

Migrating to Amazon Redshift
Wipro used their Cloud Data Warehouse Readiness Solution (CDRS) strategy to migrate data to Amazon Redshift. Using CDRS, they migrated 4.x billion records to Amazon Redshift. CDRS has a Source Analyzer that created the target data model and generated the data definition language (DDL) for tables that needed to be migrated to Amazon Redshift. The architecture included Talend (a data integration platform) running on Amazon EC2 to extract data from various source systems for ongoing change data capture (CDC) and then load it to Amazon Redshift. Talend has several built-in connectors to connect to various sources and extract the data.

The following diagram shows the architecture of this migration setup:

(more…)

Categorizing and Prioritizing a Large-Scale Move to an Open Source Database

The AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS) are tools that help facilitate and simplify migrating your commercial database to a variety of engines on Amazon RDS. AWS SCT specifically helps simplify proprietary database migrations to open source. It does this by automatically converting your source database schema and a majority of the custom code to a format that is compatible with your new target database. The custom code that the tool converts includes views, stored procedures, and functions. Any code that the tool cannot convert automatically is clearly marked so that you can convert it yourself. AWS DMS helps you migrate your data easily and securely with minimal downtime.

Changing database engines can be daunting. However, the value proposition of a highly scalable, cost-effective, fully managed service such as Amazon Aurora can make the challenge worth it. Assessing and planning for a single database migration to open source is straightforward with AWS SCT and AWS DMS. You can generate an AWS SCT assessment report and get answers about how easy it is to use these tools to migrate your databases.

The following are just a few of the articles and videos about making the transition to open source for your databases:

What if you have hundreds or even thousands of databases?
Building an assessment report and planning a move for one, two, or even ten databases to open source is a straightforward process. You probably have enough familiarity with applications that attach to those databases that you can identify which ones to move first. What do you do if you have hundreds or thousands of database instances deployed in and around your organization that you don’t have intimate application knowledge about?

If you’re in a centralized IT department, you might know how many databases you manage, have backups scheduled for them, and be tracking them in your inventory. But you might not have enough detail to plan a large-scale move or prioritize them for analysis. This post talks about how to assess your database portfolio and break it down into a manageable pipeline of migrations to make the process go smoothly.

A collaboration of smaller projects
A detailed project plan for a single heterogeneous database migration consists of 12 steps, as detailed in the following table.

Phase Description Phase Description
1 Assessment 7 Functional testing of the entire system
2 Database schema conversion 8 Performance tuning
3 Application conversion/remediation 9 Integration and deployment
4 Scripts conversion 10 Training and knowledge
5 Integration with third-party applications 11 Documentation and version control
6 Data migration 12 Post-production support

At a high level, and for the purposes of this blog, the process for planning a migration project can be grouped into five areas of work: Analysis and Planning, Schema Migration, Data Migration, Application Migration, and Cutover. How long your migration takes generally depends on how long you spend iterating over the migration and testing phases. If you are planning several migrations in tandem, you should understand when you start which ones are likely to take the most time and which ones can be tackled first or in rapid succession.

SCT Workflow

Prioritizing which databases to analyze first when moving to open source often hinges on how much proprietary database code is being used in the databases. The volume of proprietary code in use affects the amount of time it takes to translate it and test it during the migration phases of your project. You can categorize your databases by workload using dictionary queries after you establish the categories and criteria.

(more…)

How to Use SSL with the AWS Schema Conversion Tool Data Extractors

In another blog post, we described how to use the AWS Schema Conversion Tool data extractors (AWS SCT data extractors) to migrate your legacy data warehouse to Amazon Redshift. Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Migrating your data warehouse to Amazon Redshift can substantially improve query and data load performance, increase scalability, and save costs. The Schema Conversion Tool is provided free of charge by AWS to help you migrate your legacy data warehouse to the cloud.

This blog post gives you a quick overview of how you can use Secure Socket Layer (SSL) encryption with the SCT data extractors. SSL is the standard security technology for establishing an encrypted link between applications. With this link, all data passed between the applications remain private and integral.

If you’re not familiar with the overall process of how to install and configure the SCT extractors, read the blog post How to Migrate Your Data Warehouse to Amazon Redshift Using the AWS Schema Conversion Tool Data Extractors before digging into the SSL setup.

Architecture
You can install the SCT extractors on any servers with connectivity to your source data warehouse. The SCT client application issues commands to the extractors to pull data from the source system, upload the data to Amazon S3, and copy the data into Amazon Redshift. You can install multiple extractors on the same machine. Alternatively, you can separate them as needed across many machines. Which approach you might prefer depends on your available server resources and network capacities. In general, a best practice is to locate the data extractors close to your data warehouse to reduce network traffic and latency.

The extract files produced by the extractors are compressed, using lossless LZO compression, then copied to an S3 bucket. The files are compressed to reduce network transfer time and to reduce storage costs on S3. After the files are uploaded to S3, the extractor triggers Amazon Redshift to copy the data into the target tables using the standard Amazon Redshift COPY command.

Now that we know what the architecture looks like, let’s take a look at the steps involved in setting up SSL for SCT and the data extractors.

In the following example, we use Amazon EC2 running Red Hat Linux to host the extractors. However, you can install the extractors on-premises, and on many platforms, including Microsoft Windows, Ubuntu, macOS, and Red Hat.

The SCT client in this example runs on a Mac laptop. Because the client offloads data processing chores to the extractors, a small desktop or laptop is perfectly suited to run the client.

(more…)

How to Migrate Your Data Warehouse to Amazon Redshift Using the AWS Schema Conversion Tool Data Extractors

This blog post gives you a quick overview of how you can use the AWS Schema Conversion Tool (AWS SCT) data extractors to migrate your existing data warehouse to Amazon Redshift. Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Migrating your data warehouse to Amazon Redshift can substantially improve query and data load performance, increase scalability, and save costs. AWS provides the Schema Conversion Tool free of charge to help you migrate your legacy data warehouse to the cloud.

Customers usually use SCT to migrate their schema objects (table definitions, indexes, triggers, and other execution logic) from legacy data warehouse platforms. SCT has been enhanced also to migrate the actual data records. This feature, called the SCT data extractors, can migrate your data from any legacy platform (Oracle, Greenplum, Netezza, Teradata, Microsoft SQL Server, or Vertica), using a parallel, secure and fault-tolerant architecture.

Typically, there are two stages to migrating a data warehouse. In the first stage, a full copy of the legacy data is replicated to the target system. We call this stage the “full load.”  The full load can take a considerable amount of time depending on how much data there is to move. This being the case, any intervening changes are usually captured in a subsequent change data capture stage.

The SCT extractors are designed to facilitate the full load process. The extractors are designed with two key features to support the full load process:

  • Quickly move large numbers of records using multi-threaded data extraction and load processes
  • In case of failure, an extraction process can restart from the point of failure, rather than from the beginning of the entire data extract

In the second stage, SCT provides support for point-in-time data extracts so that “change deltas” since the full load can be captured and migrated in a second step.

Let’s take a look at architecture of the SCT extractors and how they can help you accomplish the full data load process.

Architecture
You can install the SCT extractors on any server machines with connectivity to your source data warehouse. The SCT client application issues commands to the extractors to pull data from the source system, upload the data to Amazon S3, and copy the data into Amazon Redshift. You can install multiple extractors on the same machine, or separate them as needed across many machines. Which is best depends on your available server resources and network capacities. In general, a best practice is to locate the data extractors close to your data warehouse to reduce network traffic and latency.

The extractors provide a fully managed framework for your data warehouse migration. The data extractors write their extract files to local storage. The extract files are compressed, using lossless LZO compression, then copied to an S3 bucket. The files are compressed to reduce network transfer time and to reduce storage costs on S3. After the files are uploaded to S3, the extractor triggers Amazon Redshift to copy the data into the target tables using the standard Amazon Redshift COPY command. Each of these steps can be repeated if necessary with no unexpected consequences on the target database.

For security purposes, SCT supports encrypted communication using Secure Socket Layer (SSL) between the SCT client and the extractors. If you want to learn more about using SSL with the extractors, we have a companion blog post to come to explain that process, How to Use SSL with the AWS Schema Conversion Tool Data Extractors. However, before moving on to SSL, we recommend that you read this post first to understand the overall extractor architecture and setup.

Now that we know what the architecture looks like, let’s take a look at the steps involved in setting up the data extractors.

The scenario we work through in this post is migrating an Oracle data warehouse to Amazon Redshift. We assume that you’ve already converted your source schema to Amazon Redshift and applied it on the target database. We won’t try to cover all possible migration scenarios—our intent is to give you a deep dive into the technology so you can confidently manage your migration projects.

Our Oracle database is hosted in Amazon RDS, but the same migration considerations apply to on-premises sources. We’ll install two extractor agents running on Red Hat Linux on Amazon EC2 instances. We’ll only work through the configuration of one extractor; the process is the same for both. The servers are configured as EC2 t2.medium instances (two virtual CPU, 4 GB of memory, and 100 GB of local storage). The instance size values can be tuned for your particular data volume, latency, and performance requirements.

The extractors are controlled by an SCT client running on a Mac laptop. Because the client offloads data processing chores to the extractors, a small desktop or laptop is perfectly suited to run the client.

A great feature of the SCT client is that it doesn’t need to be online while the data extractors are running. The SCT client can reconnect to the extractors at any time by opening the specific migration project. One client can manage many extraction projects. The data extractors continue to operate even if the SCT client is unreachable for any reason.

(more…)

Strategy for Migrating Partitioned Tables from Oracle to Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL Compatibility

Sandeep Kariro is a big data architect and Akm Raziul is a database architect at Amazon Web Services.

Enterprises have been using closed-source database systems for more than four decades. Managing data with these systems has been a critical part of running a successful business in every organization. You have to make a considerable investment up front to obtain the required hardware and enterprise licenses before you can even start building the system. With cloud computing gaining momentum in recent years and technology maturing in the open source system world, interest has been growing in moving critical traditional online transaction processing (OLTP) database workloads to open-source systems such as PostgreSQL, MySQL, and others.

Migrating one database system to another requires significant effort and planning. Even though the basics around the database system generally stay the same, implementation and operational activity on each of them can vary quite a bit.

The AWS Schema Conversion Tool (AWS SCT) helps migrate Oracle schemas and custom stored procedure code to a PostgreSQL-compatible target with minimal effort. AWS SCT generates a migration assessment report that clearly identifies objects and code that require manual conversion or rework. One critical need for high volume OLTP applications is to have the data partitioned for performance reasons. During the initial assessment, AWS SCT highlights the Oracle partitions that require manual intervention. This is due to the way current versions of PostgreSQL (as of 9.6.2) handle table partitions. PostgreSQL manages partitioned tables differently from Oracle, SQL Server, or MySQL database partitions.

In this post, we discuss how to convert Oracle Database native range and list partitions to PostgreSQL native range and list partitions.

Range and list partition definitions in Oracle
Range: Each partition holds a range of values based on the partition key. For example, you can partition data based on a date field. Each partition can hold data for a date range such as a week, a month, a quarter, etc. The first partition of the table is always open-ended toward lower values. You can optionally set the last partition to be open-ended also by defining the maximum partition value as MAXVALUE. A range partition can have up to 16 columns.

The following is an example of a table created with a range partition on create date. The table is created with three partitions. The first partition holds all data created with a create date earlier than 31-DEC-2016. The second partition holds all data created with a create date between 01-JAN-2017 and 31-DEC-2017. The third partition holds all other data created with a create date after 31-DEC-2017.

CREATE TABLE person 
	(id NUMBER(10) NOT NULL
	,name VARCHAR2(50)
	,gender VARCHAR2(1)
	,create_dt DATE DEFAULT sysdate
	)
PARTITION BY RANGE(create_dt) (
  partition p1 values less than ('01-JAN-2017'), 
  partition p2 values less than ('01-JAN-2018'), 
  partition p3 values less than (MAXVALUE)
);

(more…)

How to Migrate Your Oracle Database to PostgreSQL

Knievel Co is a database engineer at Amazon Web Services.

This blog post provides an overview on how you can migrate your Oracle database to PostgreSQL. The two biggest parts of a database migration are the schema conversion and the data replication. We go over how to use the AWS Schema Conversion Tool (AWS SCT) and the AWS Database Migration Service (AWS DMS) to tackle these two parts.

Before we cover SCT and DMS, you should take some preliminary steps that have been proven to be helpful for every migration. One way to make the migration easier is to have what is usually called a modernization phase before the migration. This phase involves taking an inventory of objects in your Oracle database and then making a few decisions.

First, deprecate any objects that are no longer needed. Don’t waste time migrating objects that no one cares about. Also, purge any historical data that you no longer need. You don’t want to waste time replicating data you don’t need, for example temporary tables and backup copies of tables from past maintenance. Second, move flat files and long strings stored in LOBs, CLOBs, LONGs, and so on into Amazon S3 or Amazon Dynamo DB. This process requires client software changes, but it reduces the complexity and size of the database and makes the overall system more efficient. Last, try to move any PL/SQL packages and procedures, especially those containing business logic, back to the client software. These objects need to be manually changed anyway if SCT cannot translate them.

The following steps are suggested for migrating to a different database engine (Oracle to PostgreSQL in this case). If you aren’t moving to a different platform, you might find more appropriate native tools or other techniques to move your database.

  1. Create your schema in the target database.
  2. Drop foreign keys and secondary indexes on the target database, and disable triggers.
  3. Set up a DMS task to replicate your data – full load and change data capture (CDC).
  4. Stop the task when the full load phase is complete, and recreate foreign keys and secondary indexes.
  5. Enable the DMS task.
  6. Migrate tools and software, and enable triggers.

Create your schema in the target database

Begin your migration by taking a look at the schema that you want to migrate. In our case, we use the AWS Schema Conversion Tool (AWS SCT) to perform the analysis. When you start the application, you need to create a new project, with the source being Oracle and the target being PostgreSQL. When you’re connected, select the name of the schema that you want to migrate on the left side. Right-click the schema name and choose Convert Schema. Then choose View / Assessment Report View.

The AWS SCT assessment report is a high-level overview of the effort required to convert your Oracle database to PostgreSQL. The following is an example of what the assessment report looks like:

(more…)

Migrate Your Procedural SQL Code with the AWS Schema Conversion Tool

Database administrators and developers rely on relational databases to store data for applications. As Forbes noted in 2016, the development of open source offerings for database management systems like PostgreSQL is causing a growing number of enterprises to migrate to lower-cost solutions for their data storage. The move to the cloud often provides an excellent opportunity to make such a change.

If your system contains a lot of business logic in the database, it can be quite a large task to translate between SQL dialects line by line. Fortunately, Amazon Web Services offers the AWS Schema Conversion Tool (AWS SCT), a standalone application with a project-based user interface that does the heavy lifting of translating SQL dialects for you.

Using AWS SCT
This post demonstrates how easy it is to migrate procedural code from Oracle to PostgreSQL using AWS SCT. It includes how to spot the differences between the two dialects and how to modify the code when needed. You can use AWS SCT to generate a high-level overview of the effort required to convert an Oracle database to Amazon RDS for PostgreSQL. In the example, the tool is running on Windows, but it can run on a number of different operating systems. To download AWS SCT, see Installing and Updating the AWS Schema Conversion Tool. For general information about the tool, start with What Is the AWS Schema Conversion Tool?

The example uses an Oracle database on Amazon EC2 and the well-known SCOTT/TIGER database as a starting point. Three stored procedures are added to demonstrate how AWS SCT handles common dialect translations. For information about installing the base SCOTT schema objects, see the Oracle FAQ wiki. You can download additional Oracle procedures and the translated PostgreSQL examples from this repository.

Prerequisites
The example in this post requires the following:

  • A source Oracle database with SCOTT schema objects installed
  • Three additional stored procedures:
    Procedure Source file
    emp_by_job oracle-emp-by-job.sql
    get_emp_info oracle-get-emp-info.sql
    employee_report oracle-employee-report.sql
  • Target Amazon RDS for PostgreSQL database
  • AWS SCT installed

Generate an assessment report
Typically, the first step in a migration is to understand its feasibility and the effort it will require. You can use AWS SCT to generate a high-level overview of the work ahead. In this example, open the context (right-click) menu for the SCOTT schema, and choose Create Report. The assessment report provides a high-level summary of how much of the schema can be converted automatically from Oracle to PostgreSQL, and the work that will be left after conversion.

The following screenshot shows the results of the assessment.

(more…)

How to Migrate Your Oracle Data Warehouse to Amazon Redshift Using AWS SCT and AWS DMS

Shree Kenghe is a solutions architect at Amazon Web Services.

This blog post gives you a quick overview of how you can use the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS) to help you migrate your existing Oracle data warehouse to Amazon Redshift. Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Migrating your Oracle data warehouse to Amazon Redshift can substantially improve query and data load performance, increase scalability, and save costs.

When migrating your database from one engine to another, you have two major things to consider: the conversion of the schema and code objects, and the migration and conversion of the data itself. AWS has tools to help do both. You can convert schema and code with AWS SCT and migrate data with AWS DMS. AWS DMS helps you migrate your data easily and securely with minimal downtime.

Let’s take a look at the architecture if you are migrating your data warehouse from Amazon RDS for Oracle to Amazon Redshift.

AWS SCT is a client application that can be installed on a workstation, laptop, or Amazon EC2 instance. AWS DMS creates a replication instance that runs on an Amazon EC2 instance in a VPC. You use this replication instance to perform the database migration. The replication instance to be used by DMS, the RDS Oracle database, and Amazon Redshift should all be in the same region. Ideally, they should all be in the same account within the same Availability Zone.

Now that we know how the architecture looks, let’s take a look at the steps to take.

(more…)

Introducing Data Extractors in AWS Schema Conversion Tool Version 1.0.602

We are pleased to announce the recent release of version 1.0.602 of the AWS Schema Conversion Tool (AWS SCT). The new version includes a new key capability—data migration agents that extract data from Oracle, MS SQL Server, Teradata, IBM Netezza, Greenplum and HPE Vertica data warehouses, prepare, optimize and upload it to Amazon Redshift.

The new data migration agents are locally installed agents designed to extract data from data warehouses based on Oracle version 11, MS SQL Server version 2008 and later, Teradata version 14.10 and later, IBM Netezza version 7.0.3 and up, Greenplum version 4.3 and up and HPE Vertica version 7.2.2 and up. When data is extracted, the AWS Schema Conversion Tool optimizes it for Amazon Redshift and saves it in local files. AWS can automatically upload these files to Amazon Redshift or you can transfer these files to an Amazon Simple Storage Service (Amazon S3) bucket, where they can be uploaded to Amazon Redshift manually. For large datasets, you can use AWS Snowball to ship the data to AWS.

The migration agents are designed to extract data in parallel and independently. For example, with a large Oracle data warehouse SCT will automatically distribute work between all available migration agents. SCT will automatically manage all the available agents to extract the data from the different partitions and tables in the schema in the most optimized way for Amazon Redshift, consolidate the data, and save it in local files. The migration agents work completely independent from SCT, and you can replace them, if needed, without any work lost. This unique independent parallel execution capability not only accelerates the extraction of the data, but can also withstand failure of one of the agents.

The new migration agents are supported on RedHat and Debian Linux distros, and the SCT client can run on Windows, Mac, Fedora, or Ubuntu. For more information on installation, security settings, and configuration, see Installing Migration Agents in the AWS SCT User Guide.

(more…)