AWS Database Blog

Automating AWS DMS Migration Tasks

This blog post gives you a quick overview of how you can schedule migration tasks for the purpose of automating your migration. You create these tasks by using the AWS Database Migration Service (AWS DMS) and using native operating system tools for either Linux or Microsoft Windows.

AWS DMS helps you migrate databases to AWS quickly and securely. You can migrate your data to and from most widely used commercial and open-source databases, such as Oracle, Microsoft SQL Server, PostgreSQL. The service supports homogeneous migrations such as Oracle to Oracle, and also heterogeneous migrations between different database platforms, such as Oracle to PostgreSQL or MySQL to Oracle.

To work with AWS DMS, either your source or target database must be in AWS. Currently, DMS doesn’t support migrating from an on-premises database to another on-premises database.

The AWS DMS migration process consists of setting up a replication instance, source and target endpoints, and a replication task. The replication task runs on the replication instance and migrates data from the source endpoint to the target endpoint.

To automate migration tasks, you use the AWS Command Line Interface (AWS CLI) to perform the database migration. If you are a first-time AWS CLI user, we recommend that you read the following documentation and get accustomed to how the CLI should be used and configured:

  1. Create an AWS Identity and Access Management (IAM) user.
  2. Set up required permissions for the IAM user.
  3. Set up the roles required to use AWS DMS.
  4. Set up the AWS CLI.


Managing Amazon ElastiCache with Terraform

Nic Jackson is Developer Advocate at HashiCorp.

Developers continue to pick Redis as their favorite NoSQL data store (see the Stack Overflow Developer Survey 2017). Amazon ElastiCache provides easy, fast, and highly available Redis on AWS. ElastiCache for Redis can be deployed via the AWS Management Console, AWS SDK, Amazon ElastiCache API, AWS CloudFormation, and through deployment tools like HashiCorp Terraform. In this post, we show how to easily deploy Amazon ElastiCache for Redis with Terraform.

Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud. It’s often used to improve application performance by reading from a fast in-memory data store instead of a slower disk-based database. Currently, ElastiCache supports two different engines:

  • Redis
  • Memcached

The source code that accompanies this post is available in a GitHub repository. We reference the source files in the repository throughout the post. So before reading on, you might want to clone it now, so you have it handy.

What is Terraform?
Terraform is a tool for managing infrastructure. It uses the principles of infrastructure as code. With infrastructure-as-code tools such as Terraform and CloudFormation templates, your entire infrastructure setup is modeled in a declarative language. Terraform enables this declarative language through the HCL language and the stanzas that make up the many resources that are configurable with cloud infrastructure. Defining your infrastructure in this way gives you a predictable and consistent method of re-creating any of the components in your application infrastructure. It has the added benefit that infrastructure defined as code can be versioned and shared easily with colleagues.

How Terraform works
Terraform is broken down into three main components:

  • Providers
  • Data sources
  • Resources

A provider defines resources and data for a particular infrastructure, such as AWS. The resources allow you to create and destroy infrastructure services like Amazon EC2 instances, virtual private clouds (VPC), and in the case of our example, ElastiCache clusters. Data elements allow you to query the state of existing resources. They enable you to perform tasks, such as retrieve the Availability Zones for a given AWS Region or return the details of an existing server or infrastructure component.


Introducing Amazon S3 and Microsoft Azure SQL Database Connectors in AWS Database Migration Service

We are excited to announce the addition of two new database connectors in AWS Database Migration Service (AWS DMS)Amazon S3 as a source and Microsoft Azure SQL Database as a source. You can now migrate data from these two new sources to all AWS DMS supported targets.

Amazon S3 as a source
You can now replicate data from an Amazon S3 bucket to our supported targets. These targets include relational databases (such as Oracle and Amazon Aurora), a data warehouse (Amazon Redshift), a NoSQL database (Amazon DynamoDB), or an Amazon S3 bucket. After performing a full load of the data files in the Amazon S3 bucket, AWS DMS can optionally replicate data changes to the target endpoint (see more information later in this post).

How to add an Amazon S3 source to AWS DMS
To add an Amazon S3 source to AWS DMS, you must provide access to an Amazon S3 bucket that contains one or more data files and a JSON document that describes the format of the data in those files.

The source data files must be in comma-separated-values (CSV) format and must be present in the Amazon S3 bucket before the full load starts.

  1. Sign in to the AWS Management Console, and choose Database Migration Service.
  2. In the navigation pane, choose Endpoints, and then choose Create endpoint.
  3. Choose the following:
    1. Endpoint type: Source
    2. Source engine: s3
  4. For Service Access Role ARN, provide an AWS account that has read access to the Amazon S3 bucket that you’re using as a source. For more details, see Prerequisites When Using S3 as a Source for AWS DMS.
  5. Add a bucket name and (optional) a bucket folder to define the path from AWS DMS to find the CSV files.
  6. For the External Table Definition, define or upload a JSON document that explains the structure of the source tables and columns in the source files.For example:1, Warriors, Celtics, Lakers2, Patriots, Seahawks, CardinalsThis translates into the following:


Viewing Amazon Elasticsearch Service Slow Logs

Today, Amazon Elasticsearch Service (Amazon ES) announced support for publishing slow logs to Amazon CloudWatch Logs. This new feature enables you to publish Elasticsearch slow logs from your indexing and search operations and gain insights into the performance of those operations.

You can enable slow logs to identify whether a performance issue on your cluster is being caused by particular queries or is due to changes in usage. You can then use that information to work with your users to optimize their queries or index configuration to address the problem.

You can turn the logs on and off at will, paying only for the CloudWatch charges based on their usage.

Set up delivery of slow logs for your domain
To enable slow logs for your domain, sign in to the AWS Management Console and choose Elasticsearch Service. On the Amazon ES console, choose your domain name in the list to open its dashboard. Then choose the Logs tab.


Replicating Amazon EC2 or On-Premises SQL Server to Amazon RDS for SQL Server

Amazon RDS for SQL Server is a managed Microsoft SQL Server database service that makes it easy to set up, operate, and scale SQL Server deployments in the cloud. Amazon RDS takes away the time-consuming database administration activities so that you can focus on your schema design, query construction, query optimization, and building your application.

There are several options for migrating to Amazon RDS for SQL Server from SQL Server on Amazon EC2 or on-premises. You can use the native backup and restore option, or you can use the replication option through the AWS Database Migration Service (AWS DMS). Using the replication option saves you time, which allows you to bulk-load the tables and then capture the data changes along the way. In this post, we look into how you can use the AWS Schema Conversion Tool (SCT) and AWS Database Migration Service (DMS) in migrating SQL Server to RDS for SQL Server.

Source and destination database
In this exercise, the source database is created from the install-on-prem.sql script, and all the examples in this post use the Amazon Web Services – Labs GitHub repository. The destination is a SQL Server 2014 Amazon RDS for SQL Server instance. The source database can also be an on-premises SQL Server. When replicating from an on-premises SQL Server, ensure that you either have Direct Connect or a VPN connection to the virtual private cloud (VPC) where your Amazon RDS for SQL Server resides. This provides added security and better replication performance.

AWS DMS replication instance and creating the AWS DMS endpoints
For this procedure, we created an AWS DMS replication instance that is used for the full load and change data capture (CDC) activities through the AWS DMS console. This server resides in the same VPC as the Amazon RDS for SQL Server instance. In AWS DMS, we created two endpoints: one for the source SQL Server and another for the target Amazon RDS for SQL Server.

AWS DMS loads your data to the destination table by table. In this case, where we are doing a homogenous migration, we use AWS DMS to create our destination tables and objects that are minimally needed for an efficient data migration. After full load is completed, right before starting the CDC, we apply the dependencies, constraints, and objects to the destination database.


Querying on Multiple Attributes in Amazon DynamoDB

Amazon DynamoDB is a non-relational key/value store database that provides incredible single-digit millisecond response times for reading or writing, and is unbounded by scaling issues. But as in any key/value store, it can be tricky to store data in a way that allows you to retrieve it efficiently. The most efficient method is to fetch the exact key of the item that you’re looking for. But if you need to search for data based on some attribute that isn’t part of the sorting key, you might wonder how to get the exact search result in the most cost-effective and efficient way.

DynamoDB provides filter expressions as one potential solution that you can use to refine the results of a Query operation. Although filtering is done on the server side before results are sent back, the read costs are calculated on the Query operation before the filter is applied. In some cases, the cost may be too high.

In this post, we discuss an alternative approach to querying on multiple attributes in DynamoDB, which is to concatenate attributes on a key. This approach allows you to get exactly the item you want, and it works great when you have a read pattern that requires all attribute values that would exist in the key to be present when you fetch an item from your DynamoDB table.

To query an item on multiple attributes without filter expressions, you must create a new global secondary index (GSI) with a new attribute as the partition key that contains the concatenation of the attributes for your query. The issue with this method is that the partition key for the GSI is a new attribute and is empty for all items in the table. If a partition key for a GSI has no value, it doesn’t exist in the index, and so the new GSI that you create is empty.

The solution is to retrofit the table with the concatenated attribute values for the new attribute so that your GSI is populated.

Problem in-depth
Suppose that you have the following item:

 id: "1234567890abcdef",
 resourceId: "1234567890abcdef",
 resourceName: "thing1",
 action: "viewed",
 accessedBy: "joe1",
 timestamp: "2017-04-01T00:00.000"

This item exists in the following table:


id (PrimaryKey) resourceId resourceName action accessedBy timestamp
111aaa 123bbb Thing1 viewed joe1 2017-05-01T00:00:00.000
111aab 123bbb Thing1 viewed jane 2017-05-01T00:00:01.000
111aac 123ccc Thing2 edited jane 2017-05-01T00:00:25.000

And the Java model is defined as follows:

@DynamoDBTable(tableName = "table-ResourceAccessAudit")
public class Audit {
    private String id;
    private String resourceId;
    private String resourceName;
    private String action;
    private String accessedBy;
    private String timestamp;
    public String getId() {return id;}
    public void setId(String id) { = id;}
    public String getResourceId() {return resourceId;}
    public void setResourceId(String resourceId) {this.resourceId = resourceId;}
    public String getResourceName() {return resourceName;}
    public void setResourceName(String resourceName) {this.resourceName = resourceName;}
    public String getAction() {return action;}
    public void setAction(String action) {this.action = action;}
    public String getAccessedBy() {return accessedBy;}
    public void setAccessedBy(String accessedBy) {this.accessedBy = accessedBy;}
    public String getTimestamp() {return timestamp;}
    public void setTimestamp(String timestamp) {this.timestamp = timestamp;}


­­­Automating Cross-Region and Cross-Account Snapshot Copies with the Snapshot Tool for Amazon Aurora

This post covers how to install the Snapshot Tool for Amazon Aurora to automate copying snapshots across AWS accounts and Regions.

Although this is not a comprehensive list, here are a few situations where you might find this tool useful:

  • To keep a copy of your data in a different account for security purposes.
  • To keep a copy of your data in a different Region for disaster recovery.
  • As an easy way to update development/testing environments with fresh production data when development/testing runs on a different account than production.

The Snapshot Tool for Aurora automates the task of creating manual snapshots—copying them into a different account and a different Region and deleting them after a specified number of days. The tool also allows you to specify the backup schedule (at what times and how often) and a retention period in days.

If you want to see these features as a part of the Amazon RDS service, please leave your feedback in the comments section.

This post assumes that you are familiar with Amazon RDS and Amazon Aurora backups. For information about managing snapshots, refer to the Amazon RDS documentation.

How the Snapshot Tool for Aurora works
The Snapshot Tool for Aurora uses AWS Lambda functions to create, copy, and delete tasks. These tasks are orchestrated using AWS Step Functions to make sure that each of them are retried, with the scheduling powered by Amazon CloudWatch Events. If any of these tasks ends in an error, it triggers a CloudWatch Alarm, which pushes a message to an Amazon SNS topic. You can subscribe to these SNS topics to receive an email, trigger a Lambda function of your own, post a message to a Slack channel, etc.


Automating SQL Caching for Amazon ElastiCache and Amazon RDS

Our guest bloggers are from Heimdall Data, an AWS partner that offers “DBA assistant” tools (you can find them in the AWS Marketplace). These tools include automated caching that combines Amazon ElastiCache for Redis with relational databases, including MySQL, MariaDB, PostgreSQL, Oracle, and Microsoft SQL Server. Let them step you through how to use their tools to automate the creation and management of a Redis query cache using ElastiCache and Amazon RDS (or Amazon Aurora).

Ramon Lawrence, Heimdall Data

Slow-performing applications reduce user engagement and customer satisfaction, and eventually lower revenue. Application-database inefficiency (for example, network latency or slow queries) is a primary cause of performance bottlenecks. This post shows how Heimdall Data’s auto-caching solution with ElastiCache improves performance with no code changes, while preventing stale cache data.

Various database systems provide different features and performance. Systems such as Amazon Aurora provide dynamic scaling and performance. Another AWS Database Blog post covers horizontal (more servers) and vertical (larger servers) scaling and the trade-offs. Database scaling can be costly, especially for commercially licensed databases.

Others have used Amazon ElastiCache to improve responsiveness. Despite the simplicity of ElastiCache, developers are still challenged to know what to cache and what to invalidate, and to ensure that all data is up to date. This requires manual application code changes and database modifications.

We can now automate caching and invalidation in ElastiCache with the Heimdall Data SQL caching solution.

The Heimdall Data Access Platform (HDAP) has two components: the Heimdall Central Manager (HCM), and the Heimdall Data Access Layer (HDAL). The Heimdall system is an intelligent query routing and caching data access layer that is installed in a distributed way on each Amazon EC2 instance.

Heimdall Data software packaging includes the following:

  • Database Proxy for Amazon RDS: Aurora, MySQL, SQL Server 2008+, PostgreSQL
  • JDBC Driver: Gives access to Oracle and any other JDBC-compliant database

For deployment, the only application-level change is to modify the host and port or JDBC URL to route through the Heimdall proxy/driver. The following is a sample architecture diagram for the MySQL proxy configuration:

In proxy mode (the focus of this article), there are two modes of deployment:

  • Distributed mode: A proxy resides on each Amazon EC2 application instance, for optimal performance (shown in the preceding diagram).
  • Centralized mode: One Amazon EC2 instance proxy services many application servers.

The proxy provides two levels of caching: 1) locally on the application server, and 2) on ElastiCache (akin to an L1/L2 cache). As SQL is sent from the application to the database, the proxy responds from the cache and routes requests to different servers (for use in load-balancing and read/write split). All this functionality is provided by Heimdall Data, requiring zero code changes.


Migrating a SQL Server Database to a MySQL-Compatible Database Engine

This post provides an overview of how you can migrate your Microsoft SQL Server database to a MySQL-compatible database engine such as Amazon RDS for MySQL, Amazon RDS for MariaDB, or Amazon Aurora MySQL.

The following are the two main parts of a database migration:

  1. Schema conversion: Converting the schema objects is usually the most time-consuming operation in a heterogeneous database migration. It is the foundation of the database and must be handled in a well-planned manner. If the schema conversion is done appropriately, a major milestone for the heterogeneous migration is complete.
  2. Data migration: The base data elements are like the building blocks for the schema foundation mentioned previously. If the foundation is laid properly, arranging these blocks is comparatively simpler during the migration when best practices are followed.

In this post, we go over how to use the AWS Schema Conversion Tool (AWS SCT) and the AWS Database Migration Service (AWS DMS) to migrate a SQL Server database to popular MySQL-based database engines like Aurora MySQL, MySQL, and MariaDB. In this post, we refer to all these database engines as MySQL.

When it comes to migrating a SQL Server database to a MySQL-compatible database, the database object semantics between SQL Server and MySQL are similar. However, there are major architectural differences that must be considered when you’re migrating from SQL Server to MySQL. For example, in MySQL, “database” and “schema” both share the same meaning. There are no functional differences between a database and a schema. The following example illustrates this point:

mysql> create schema test01;
Query OK, 1 row affected (0.01 sec)

mysql> create database test01;
ERROR 1007 (HY000): Can't create database 'test01'; database exists

As the example shows, “database” and “schema” are synonymous and don’t have any separate meaning. When it comes to referring to a table within a database, the complete table identifier in MySQL looks like databasename.tablename.

On the other hand, SQL Server does convey a different meaning and functionality to the “database” and “schema” keywords. In SQL Server, a database is the main container that holds all objects, data, and log files. A schema is an object within the specific database that logically groups the other database objects together. SQL Server uses the schema name dbo by default. However, this can be changed to meet organizational, functional, or business needs. The complete table identifier in SQL Server is Databasename.Schemaname.tablename.


Using Amazon Redshift for Fast Analytical Reports

With digital data growing at an incomprehensible rate, enterprises are finding it difficult to ingest, store, and analyze the data quickly while keeping costs low. Traditional data warehouse systems need constant upgrades in terms of compute and storage to meet these challenges.

In this post, we provide insights into how AWS Premier Partner Wipro helped their customer (a leading US business services company) move their workload from on-premise data warehouse to Amazon Redshift. This move enabled the company to reap the benefits of scalability and performance without affecting how their end users consume reports.

Current data warehouse environment and challenges faced
The customer was running commercial enterprise data warehouse that contained aggregated data from different internal reporting systems across geographies. Their primary goal was to provide quick and accurate analytics to drive faster business decisions. The user base was distributed globally. Meeting this goal was difficult due to the following challenges:

  • The data warehouse (5 TB) was growing at over 20 percent year over year (YoY), with higher growth expected in the future. This growth required them to keep upgrading the hardware to meet the storage and compute needs, which was expensive.
  • The Analytical Dashboard experienced performance issues because of the growing data and user base.
  • Licensing was based on CPU cores, so adding hardware to support the growth also required additional investment in licenses, further spiraling the costs.

Migrating to Amazon Redshift
Wipro used their Cloud Data Warehouse Readiness Solution (CDRS) strategy to migrate data to Amazon Redshift. Using CDRS, they migrated 4.x billion records to Amazon Redshift. CDRS has a Source Analyzer that created the target data model and generated the data definition language (DDL) for tables that needed to be migrated to Amazon Redshift. The architecture included Talend (a data integration platform) running on Amazon EC2 to extract data from various source systems for ongoing change data capture (CDC) and then load it to Amazon Redshift. Talend has several built-in connectors to connect to various sources and extract the data.

The following diagram shows the architecture of this migration setup: