By Shakil Langha, Amazon Web Services
Platform migration for Oracle databases varies in complexity based on the size of the data and the number and variety of database objects. Depending on the complexity of your database and size of your data, you might be able to use one or more of these techniques.
Before you use any of these migration techniques, we recommend the best practice of taking a backup of your database. You can back up your Amazon RDS Instances by creating snapshots. Later, you can restore the database from the snapshots using the Restore from DB Snapshot or Restore to Point In Time options on the RDS tab of the AWS Management Console. You can also use the AWS RDS command line methods rds-restore-db-instance-from-db-snapshot or rds-restore-db-instance-to-point-in-time. These and other best practices are covered in the appendix to this document.
Oracle SQL Developer
Oracle SQL Developer is a graphical Java tool distributed without cost by Oracle. This tool can be installed on your desktop computer (Windows, Linux, or Mac). Oracle SQL Developer implements options for migrating data between two Oracle databases or for migrating data from other databases, such as MySQL, to Oracle. Oracle SQL Developer is best suited for migrating small databases, although it could migrate a large number of objects. We recommend that you read the Oracle SQL Developer product documentation before you begin migrating your data.
After you install SQL Developer on one of your servers or on your desktop, you can use it to connect to your source and target databases. Use the Database copy command on the Tools menu to copy your data to your new AWS RDS instance.
You can learn more about Oracle SQL Developer (and download it): http://www.oracle.com/technetwork/developer-tools/sql-developer.
Oracle also has documentation on how to migrate from other databases (e.g., MySQL, SQL Server, etc.) to Oracle: http://www.oracle.com/technetwork/database/migration.
Oracle Export/Import Utility
The export/import utilities from Oracle are best suited for migrations where the data size is small and data types such as binary float and double are not required. The import process creates the schema objects so you do not need to run a script to create them beforehand, making this process well suited for databases with small tables. The following example demonstrates how these utilities can be used to export and import specific tables.
There are other variations of the export and import commands that might be better suited to your needs. See Oracle's documentation for full details.
Export/Import Utility Example
Export the tables from the source database using the command below. Substitute username/password as appropriate.
exp cust_dba@ORCL FILE=exp_file.dmp TABLES=(tab1,tab2,tab3) LOG=exp_file.log
The export process creates a binary dump file that contains both the schema and data for the specified tables. Now this schema and data can be imported into a target database using the command:
imp cust_dba@targetdb FROMUSER=cust_schema TOUSER=cust_schema \ TABLES=(tab1,tab2,tab3) FILE=exp_file.dmp LOG=imp_file.log
Oracle Data Pump
Oracle Data Pump is a long-term replacement for the Export/Import utilities. Data Pump contains many enhancements over Export/Import, and it is much faster at migrating large data sets between databases. In addition, Data Pump does not require you to create an export file — you can perform a network import to read data directly from a source database and write it to a target database. The following example shows how to use Data Pump to copy an entire schema from an Oracle database running on Amazon EC2 into an Amazon RDS instance.
- Log in to the EC2 instance as the oracle user. You will need to ensure that the database is running, as well as the Oracle Net listener process. Use the lsnrctl start command to start the listener, so that it can accept incoming connections to you database.
- You will need to have two entries in the tnsnames.ora file — one for the local database, and one for the RDS instance. Here is a tnsnames.ora excerpt:
EC2DB = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = ip-12-46-107-195.ec2.internal)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = ec2db) ) ) RDSDB = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = mydb.cg921fiuepnr.us-east-1.rds.amazonaws.com)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = rdsdb) )
Note that EC2DB is a "loopback" connection, and that the hostname is the private DNS name for the EC2 instance. You can obtain the DNS name from the Amazon EC2 Management Console. RDSDB points to your Amazon RDS instance. The hostname is the endpoint of the RDS instance. You can obtain the endpoint from the Amazon RDS Management Console.
- Connect to your local Oracle database and create a database link pointing to the RDS instance:
sqlplus system/manager ... create database link rdsdb connect to master identified by xxxx using 'rdsdb';
- Next, connect to your RDS instance and create a database link pointing to the database on EC2:
sqlplus master/xxxx@rdsdb ... create database link ec2db connect to system identified by yyyy using '(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = ip-12-46-107-195.ec2.internal)(PORT = 1521))(CONNECT_DATA = (SERVICE_NAME = ec2db)))';;
The fully qualified connect string is necessary because the RDS instance cannot access the tnsnames.ora file on your EC2 instance.
- Now suppose that you have a user named CUST_SCHEMA in your local database, and that user owns several database tables, indexes, and other database objects. You want to import this user, along with everything that it owns, into your RDS instance. You can use this command to perform a network import from the EC2 database to the RDS instance:
impdp master/xxxx@rdsdb directory=data_pump_dir network_link=ec2db schemas=cust_schema
The impdp command requires a directory object; by default, all Amazon RDS instances have a preconfigured directory named DATA_PUMP_DIR for this purpose. The import takes place using the ec2db database link that you created earlier. Finally, all of the contents of CUST_SCHEMA are duplicated in the RDS instance.
The Oracle SQL*Loader process is well suited for large databases that have a limited number of objects in them. Since the process involved in exporting from a source database and loading to a target database is very specific to the schema, the following example creates the sample schema objects, exports from a source, and then loads to a target database.
Oracle SQL*Loader Example
Create a sample source table using the DDL below.
create table customer_0 tablespace users as select rownum id, o.* from all_objects o, all_objects x where rownum <= 1000000;
On the target AWS RDS instance, create a destination table that will be used to load the data.
create table customer_1 tablespace users as select 0 as id, owner, object_name, created from all_objects where 1=2;
The data will be exported from the source database to a flat file with delimiters. This example uses SQL*Plus for this purpose. For your data, you will likely need to generate a script that does the export for all the objects in the database.
alter session set nls_date_format = 'YYYY/MM/DD HH24:MI:SS'; set linesize 800 HEADING OFF FEEDBACK OFF array 5000 pagesize 0 spool customer_0.out SET MARKUP HTML PREFORMAT ON SET COLSEP ',' SELECT id, owner, object_name, created FROM customer_0; spool off
You need to create a control file to describe the data. Again, depending on your data, you will need to build a script that does this step.
cat << EOF > sqlldr_1.ctl load data infile customer_0.out into table customer_1 APPEND fields terminated by "," optionally enclosed by '"' ( id POSITION(01:10) INTEGER EXTERNAL, owner POSITION(12:41) CHAR, object_name POSITION(43:72) CHAR, created POSITION(74:92) date "YYYY/MM/DD HH24:MI:SS" ) EOF
If needed, copy the files generated by the preceding code to a staging area, such as an Amazon EC2 instance. Depending on the size of the files, you might need to make use of utilities such as Tsunami UDP, discussed in the Appendix following the article.
Finally, import the data using SQL*Loader with the appropriate username and password for the target database.
sqlldr cust_dba@targetdb control=sqlldr_1.ctl BINDSIZE=10485760 READSIZE=10485760 ROWS=1000
Oracle Materialized Views
You can also make use of Oracle materialized view replication to migrate large datasets efficiently. Replication allows you to keep the target tables in sync with the source on an ongoing basis, so the actual cutover to Amazon RDS can be done later, if needed. The replication is set up using a database link from the Amazon RDS instance to the source database.
One requirement for materialized views is to allow access from the target database to the source database. In the following example, access rules were enabled on the source database to allow the Amazon RDS target database to connect to the source over SQLNet.
Oracle Materialized View Example
Create a user account on both source and Amazon RDS target instances that can authenticate with the same password.
create user dblink_user identified by password default tablespace users temporary tablespace temp; grant create session to dblink_user; grant select any table to dblink_user; grant select any dictionary to dblink_user;
Create a database link from the Amazon RDS target instance to the source instance using the newly created dblink_user.
create database link remote_site connect to dblink_user identified by password using '(description=(address=(protocol=tcp) (host= sourcedb.rds.amazonaws.com) (port=1521)) (connect_data=(sid=sourcedb)))' /
Test the link:
select * from v$instance@remote_site;
Create a sample table with primary key and materialized view log on the source instance.
create table customer_0 tablespace users as select rownum id, o.* from all_objects o, all_objects x where rownum <= 1000000; alter table customer_0 add constraint pk_customer_0 primary key (id) using index; create materialized view log on customer_0;
On the target Amazon RDS instance, create a materialized view.
CREATE MATERIALIZED VIEW customer_0 BUILD IMMEDIATE REFRESH FAST AS SELECT * FROM cust_dba.customer_0@remote_site;
Amazon RDS Import Best Practices
Before you begin a large data import, we recommend that you stop applications from accessing the database, create a database snapshot, and disable automated backups. After the import process is completed, turn the automated backups back on again. The sections below describe the details involved in each step.
Stop Applications from Accessing the Target Amazon RDS Instance
If applications cannot access the target instance the import process will perform better because there is less load on the database. This also prevents conflicts between the data being loaded and any application making changes to the data. In addition, in case you need to roll back to a prior snapshot, the only changes you will lose will be from the import. You don't need to worry about changes made by any other application.
Create a Database Snapshot
If the target database is already populated with data, we recommend that you take a snapshot of the database before proceeding with the import. In case the import does not go smoothly, or you want to discard the changes, the database can then be restored using the snapshot. When taking a snapshot, I/O operations to the database are suspended for a few minutes while the backup process completes. Various commands involved in creating and restoring from a snapshot are provided below.
Take a snapshot of the MyRDS instance and assign it an identifier of preload:
rds-create-db-snapshot [MyRDS] --db-snapshot-identifier=preload
Alternatively, you could use the AWS Management Console Take Snapshot option.
To restore from a snapshot to a new database instance, use the command:
rds-restore-db-instance-from-db-snapshot [MyRDS2] --db-snapshot-identifier=preload
Alternatively, you can use the AWS Management Console Restore from Snapshot option.
To restore from a snapshot to the same instance, you first need to delete the instance so it can be recreated. The commands below take a "final" snapshot of the MyRDS instance before deleting and recreating it:
rds-delete-db-instance [MyRDS] --final-db-snapshot-identifier myrdsfinal rds-restore-db-instance-from-db-snapshot [MyRDS] --db-snapshot-identifier=preload
Alternatively, you can use the AWS Management Console Delete option with the Create Final Snapshot field set to Yes. Then select the Restore from Snapshot option.
Disable Automated Amazon RDS Backups
Automated backups are required for point-in-time recovery. Data import performance can be enhanced by disabling these automated backups; however you will no longer be able to restore the database to a specified point in time. This does not prevent you from using prior snapshots to recover the database, and any snapshots you take will remain available. However, disabling automated backups erases all existing backups of the RDS instance, so use your best judgment and weigh the performance improvements against the need for point-in-time recovery.
When you disable automated backups you can reduce load time by up to 25 percent, plus you can reduce the need for storage space. This step is especially recommended if you are importing to an empty database; you do not need point-in-time recovery capability in such cases.
By default, Amazon RDS instances do automated backups with a one-day retention period. By setting the backup retention period to zero, you essentially disable these backups. This change requires a database restart.
rds-modify-db-instance [MyRDSInstance] --apply-immediately --backup-retention-period=0
Alternatively, you can use the AWS Management Console Modify DB Instance option and set the Backup Retention Period to zero.
Enable Automated Backups after Import
Automated backups should be re-enabled after the load process completes. The backup retention period can be reset to the default period of one day. This will require a restart of the database.
rds-modify-db-instance AcmeRDS --apply-immediately --backup-retention-period=1
Alternatively, you could use the AWS Management Console Modify DB Instance option and set the Backup Retention Period to 1.
Transfer Bulk Data Using Tsunami UDP
Tsunami UDP is an open source file transfer protocol that provides faster transfers than is possible with FTP. The following example provides a simple way to set up and transfer files using Tsunami. For a complete reference and advanced setup, see the Tsunami UDP Usage Doc.
You can follow these steps to transfer large data files to a staging area on Amazon EC2, and after the data is available within Amazon EC2, use one of the methods described earlier to then populate the RDS instance.
Download and Build Tsunami
Verify that the cvs, autoconf, gcc, and automake packages are installed.
cvs -z3 -d:pserver:email@example.com:/cvsroot/tsunami-udp co -P tsunami-udp cd tsunami-udp ./recompile.sh sudo make install
This will install both the Tsunami client and server.
Open required ports
Tsunami UDP by default communicates using port 46224 on both TCP and UDP. You will need to ensure that the client and server instances allow communication on these ports.
Start Tsunami Daemon on Server
cd [directory/where/outbound/files/kept] tsunamid *
Get Files on Client
tsunami connect [remoteserver] get [filename]
AWS Import/Export Service
AWS provides an Import/Export service to facilitate moving large amounts of data into and out of Amazon Simple Storage Service (Amazon S3) using portable storage devices. The data transfer between the devices and Amazon S3 is conducted over high-speed internal networks, making this service an attractive option for very large data sets.
For purposes of data migration to an Amazon RDS instance, you can use the Import service to get the data into Amazon S3 storage. Next, copy the data files to an Amazon EC2 instance used for staging purposes, and from there use other mechanisms provide in this article to load the data to Amazon RDS.
Oracle Export Utility: http://download.oracle.com/docs/cd/E14072_01/server.112/e10701/original_export.htm
Oracle Import Utility: http://download.oracle.com/docs/cd/E14072_01/server.112/e10701/original_import.htm
Oracle SQL*Loader Utility: http://download.oracle.com/docs/cd/E14072_01/server.112/e10701/ldr_concepts.htm
Oracle Database Link Reference: http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/statements_5005.htm
Oracle Advanced Replication: http://download.oracle.com/docs/cd/E14072_01/server.112/e10706/repoverview.htm
Oracle SQL Developer: http://www.oracle.com/technetwork/developer-tools/sql-developer
Using Security Groups on AWS: http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/index.html?using-network-security.html
Tsunami UDP: http://sourceforge.net/projects/tsunami-udp/
Tsunami Usage Doc: http://tsunami-udp.cvs.sourceforge.net/viewvc/tsunami-udp/docs/USAGE.txt
AWS Import/Export Service: http://aws.amazon.com/importexport/