AWS Database Blog

Creating highly available IBM Db2 databases in AWS

by Nicholas Ansell | on | in Database* | Permalink | Comments |  Share

Many AWS customers run mission-critical workloads using the IBM Db2 database platform. Typically, these workloads require a highly available configuration to make sure that the database is available in the event of a node or site failure.

The traditional IBM method to achieve high availability is to use shared storage and virtual IP addresses, which are orchestrated by Tivoli System Automation for Multi-Platforms (TSAMP). This blog post presents a fully automated solution using native IBM and AWS technologies. Now you can run mission-critical Db2 workloads in AWS and be confident that Db2 databases will be highly available.

Note: The IBM Db2 version implemented in this guide is a fully featured 90-day trial of Db2 for Linux, Unix, and Microsoft Windows version 11. After the trial period ends, you can select the required Db2 edition when purchasing and installing the associated license files. Editions supported by this implementation are Db2 Advanced Enterprise Server Edition, Db2 Enterprise Server Edition, Db2 Advanced Workgroup Server Edition, and Db2 Workgroup Server Edition. For more information on the features of each edition, see this article on the IBM website.

By carrying out the steps in this article, we create a highly available Db2 database that spans two Availability Zones (AZs). Data is replicated between the primary instance in AZ1 and the standby instance in AZ2 using IBM High Availability Disaster Recovery (HADR) replication. If the primary node becomes unavailable, TSAMP detects this and fails over to the standby instance. Db2 client applications automatically reconnect to the standby instance using the IBM Automatic Client Reroute (ACR) feature.

Initial setup
This solution deploys into a default virtual private cloud (VPC) in AWS. A default VPC is created automatically in each AWS Region when your AWS account is created. However, if you no longer have a default VPC, before proceeding make sure that you have the following:

  • A VPC with two public subnets. Each subnet should be placed in a different Availability Zone within the same AWS Region.
  • An internet gateway attached to the VPC. Each subnet should have a route to the internet through the internet gateway.

First, we create an Amazon S3 bucket and upload the IBM DB2 installation file, as described following. The S3 bucket is also used to exchange information between the primary and standby instances during the automated build cycle.

  1. Create an S3 bucket in the same AWS Region where you want to deploy the Db2 solution. No additional settings are required within the bucket configuration.
  2. In the bucket, create a folder (prefix) named db2.
  3. Download IBM Db2 90 day trial from the IBM website.
  4. Upload the downloaded gunzip file to the db2 folder in the S3 bucket that you created earlier.

Building the highly available Db2 solution
The solution implementation is fully automated using an AWS CloudFormation template. Open the CloudFormation console to get started.

The CloudFormation stack requires a few parameters:

  • VpcId: Enter the ID of the VPC to create the solution in.
  • Stack Name: Enter a meaningful name for the stack, for example Db2-hadr.
  • KeyPairName: Choose an existing Amazon EC2 key pair to access the Db2 EC2 instances.
  • LinuxInstanceType: Choose an appropriate instance type for the workload.
  • DB2VolumeSize: Enter the size of the Amazon EBS volume to host the database.
  • S3Bucket: Enter the name of the S3 bucket created earlier.
  • DB2Instance: Leave this parameter as the default, or enter a Db2 instance name.
  • DB2Database: Leave this parameter as the default, or enter a Db2 database name.
  • SyncMode: Select the required synchronization mode.
  • SSHCidr: Enter a valid CIDR block. Machines within this CIDR block are permitted to connect to the EC2 instances using Secure Shell (SSH).
  • DB2ClientCidr: Enter a valid CIDR block. Machines within this CIDR block are permitted to connect to the Db2 database.
  • PrimaryEC2InstanceName: Leave this parameter as the default, or enter an EC2 instance name.
  • PrimarySubnetId: Enter the ID of the subnet to place the primary EC2 instance in.
  • SecondaryEC2InstanceName: Leave as default, or enter an EC2 instance name.
  • SecondarySubnetId: Enter the ID of the subnet to place the secondary EC2 instance in.

When you’ve entered the parameters, do the following:

  1. Choose Next.
  2. On the next screen, enter any required tags, an IAM role, or any advanced options, and then choose Next.
  3. Review the details on the final screen, and then choose Create to start building the solution.

This solution bootstraps each EC2 instance using user data. For more information, view the user data section within the CloudFormation template. To view the progress, simply connect to each EC2 instance using an SSH client and run the command tail –f /var/log/cloud-init-output.log.

Viewing the Db2 cluster details
When the bootstrap sequence within the CloudFormation template has finished, you can view the cluster details using three standard IBM commands: lssam, lsrpnode, and db2pd.

You can view the HADR configuration by running the following command on the primary and standby instances as the root user (replace testdb with your custom database name if you’ve entered one).

su - db2inst1 -c "db2pd -hadr -db testdb"

On each instance, you should see something similar to the following.

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = PEER
                           HADR_FLAGS = TCP_PROTOCOL
                  PRIMARY_MEMBER_HOST = ip-10-240-16-50
                     PRIMARY_INSTANCE = Db2inst1
                       PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = ip-10-240-17-180
                     STANDBY_INSTANCE = Db2inst1
                       STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED
             HADR_CONNECT_STATUS_TIME = 12/06/2017 15:31:15.222197 (1512574275)
          HEARTBEAT_INTERVAL(seconds) = 15
                     HEARTBEAT_MISSED = 0
                   HEARTBEAT_EXPECTED = 26
                HADR_TIMEOUT(seconds) = 60
        TIME_SINCE_LAST_RECV(seconds) = 3
             PEER_WAIT_LIMIT(seconds) = 0
           LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000460
   LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.056
                  LOG_HADR_WAIT_COUNT = 121
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 332800
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 236184
            PRIMARY_LOG_FILE,PAGE,POS = S0000000.LOG, 64, 45099263
            STANDBY_LOG_FILE,PAGE,POS = S0000000.LOG, 64, 45099263
                  HADR_LOG_GAP(bytes) = 0
     STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000000.LOG, 64, 45099263
       STANDBY_RECV_REPLAY_GAP(bytes) = 1220384
                     PRIMARY_LOG_TIME = 12/06/2017 15:34:17.000000 (1512574457)
                     STANDBY_LOG_TIME = 12/06/2017 15:34:17.000000 (1512574457)
              STANDBY_REPLAY_LOG_TIME = 12/06/2017 15:34:17.000000 (1512574457)
         STANDBY_RECV_BUF_SIZE(pages) = 4300
             STANDBY_RECV_BUF_PERCENT = 0
           STANDBY_SPOOL_LIMIT(pages) = 25600
                STANDBY_SPOOL_PERCENT = 0
                   STANDBY_ERROR_TIME = NULL
                 PEER_WINDOW(seconds) = 120
                      PEER_WINDOW_END = 12/06/2017 15:39:47.000000 (1512574787)
             READS_ON_STANDBY_ENABLED = N

In this output, note the following parameters:

  • HADR_ROLE: This parameter indicates whether the instance is a primary or standby node.
  • HADR_STATE: This parameter should show a value of PEER.
  • HADR_CONNECT_STATUS: This parameter should show a value of CONNECTED.

You can list the TSA cluster nodes by running the lsrpnode command as the root user on either the primary or standby instance. As a result, you should see something similar to the following, which shows the name, status, and IBM Reliable Scalable Cluster Technology (RSCT) version of each node.

Name             OpState RSCTVersion 
ip-10-240-17-180 Online  3.2.1.2     
ip-10-240-16-50  Online  3.2.1.2 

You can view the TSA cluster resources by running the lssam command on the primary instance. As a result, you should see something similar to the following.

Online IBM.ResourceGroup:Db2_Db2inst1_Db2inst1_TESTDB-rg Nominal=Online
        '- Online IBM.Application:Db2_Db2inst1_Db2inst1_TESTDB-rs
                |- Online IBM.Application:Db2_Db2inst1_Db2inst1_TESTDB-rs:ip-10-240-16-50
                '- Offline IBM.Application:Db2_Db2inst1_Db2inst1_TESTDB-rs:ip-10-240-17-180
Online IBM.ResourceGroup:Db2_Db2inst1_ip-10-240-16-50_0-rg Nominal=Online
        '- Online IBM.Application:Db2_Db2inst1_ip-10-240-16-50_0-rs
                '- Online IBM.Application:Db2_Db2inst1_ip-10-240-16-50_0-rs:ip-10-240-16-50
Online IBM.ResourceGroup:Db2_Db2inst1_ip-10-240-17-180_0-rg Nominal=Online
        '- Online IBM.Application:Db2_Db2inst1_ip-10-240-17-180_0-rs
                '- Online IBM.Application:Db2_Db2inst1_ip-10-240-17-180_0-rs:ip-10-240-17-180
Online IBM.Equivalency:Db2_Db2inst1_Db2inst1_TESTDB-rg_group-equ
        |- Online IBM.PeerNode:ip-10-240-16-50:ip-10-240-16-50
        '- Online IBM.PeerNode:ip-10-240-17-180:ip-10-240-17-180
Online IBM.Equivalency:Db2_Db2inst1_ip-10-240-16-50_0-rg_group-equ
        '- Online IBM.PeerNode:ip-10-240-16-50:ip-10-240-16-50
Online IBM.Equivalency:Db2_Db2inst1_ip-10-240-17-180_0-rg_group-equ
        '- Online IBM.PeerNode:ip-10-240-17-180:ip-10-240-17-180

The lssam command shows the resources and resource groups within the TSA cluster. The offline resource indicates that the database is not active on the standby node.

Testing the cluster
You can test the cluster by initiating a manual failover using the following command.

rgreq -o move <name of TSAMP resource group>

The following is an example.

rgreq -o move Db2_Db2inst1_Db2inst1_TESTDB-rg

After running the command, run the lssam command a few times to watch the database failover to the standby node.

Testing Automatic Client Rerouting (ACR)
Next, we test ACR. For simplicity, we connect to the database using the DBA user from the Db2 client instance. However, before we can connect, the DBA user must have a password set on both the primary and standby instances. Assign a password to the DBA user by running the following command on each Db2 instance.

echo "<DBA user name>:<Password>" | chpasswd

The following is an example.

echo "db2inst1:Pass.123" | chpasswd

Next, connect to the Db2 client machine using SSH and perform the following steps. These steps connect to the primary instance and request database details. The primary instance returns an alternate server host name—the standby instance. An ACR-compatible Db2 client remembers the alternate host name and automatically connects to the alternative, if the primary instance is unavailable.

Run the following commands using the DBA user on the Db2 client:

  1. Create an alias for the Db2 database with the following command.
    db2 catalog tcpip node HAPNODE remote <NAME-OF-PRIMARY-INSTANCE> server <DB Listener Port>

    The following is an example.

    db2 catalog tcpip node HAPNODE remote ip-1-1-1-1 server 60000
  2. Catalog the database with the following command.
    db2 catalog db <DB-Name> at node HAPNODE

    The following is an example.

    db2 catalog db TESTDB at node HAPNODE
  3. Connect to the database with the following command.
    db2 connect to <DB-Name> user <Db2InstanceName> using <Password>

    The following is an example.

    db2 connect to TESTDB user db2inst1 using Pass.123
  4. Run the following command to show the name of the standby instance. This name appears in the Alternate server hostname field as shown in the output following.
    db2 list db directory

    The following is example output.

    Alternate server hostname            = ip-10-240-17-180
    Alternate server port number         = 60000

Summary
You now have a highly available Db2 database running on AWS. All transactions written to the primary instance are automatically synchronized to the standby instance using the synchronization method selected (ASYNC, NEARSYNC, or SYNC).

If the primary instance fails or becomes unavailable, TSAMP detects the failure and automatically fails over to the standby instance. Upon failover, Db2 clients detect the primary is unavailable and automatically connect to the standby instance.

Before considering this solution for production use, you might need to make a few changes. For example, you might want to place the Db2 servers in private subnets. Other potential changes to consider are customizing options in the response file, or automatically configuring Db2 software licenses.

Time to start migrating your mission-critical Db2-based applications to AWS!


About the Authors

Nicholas Ansell is a senior consultant with AWS Professional Services. He works closely with customers to help realize their goals using AWS services.