Containers

How to containerize legacy code into Red Hat OpenShift on AWS (ROSA)

Introduction

Enterprise customers have trained their IT staff on legacy programming languages, like COBOL, for decades. These legacy programs have stood the test of time and still run many mission-critical business applications which are typical for these legacy platforms. While various migration solutions like  AWS Blu Age and AWS Micro Focus Enterprise technology exist for legacy applications, this often means that the customer will have to learn a new programming language. In this post, we show you how to containerize legacy applications on AWS with minimal effort.

The Existing Legacy Application

COBOL is still widely used in applications deployed on mainframe computers which run large-scale batch and transactional processing jobs. The example COBOL application that we are going to use in this post, reads a comma separated values (CSV) input file and generates a report. The input file is regularly uploaded to the mainframe side via File Transfer Protocol (FTP) daily and a COBOL application processes it as a batch job.

In the following section, the example input file is a COBOL program that reformats the output into a tabular style.

Example input file:

Alejandro, Rosalez, 123 Any Street, Any Town, NS, 1234
Akua, Mansa, 234 Some Street, Some Town, VI, 2345
Carlos, Salazar, 345 North Street, Any Town, WA, 9567 
Nikhil, Jayasha, 839 Some Crescent, Some Town, NT, 9023
Richard, Roe, 456 Left Street, Some Town, IR, 8934 
Kwesi, Manu, 567 Right Avenue, Any Town, SA, 1030 
Wang, Xiulan, 678 Some Street, Some Town, QL,7890

The COBOL application:

>>SOURCE FORMAT FREE
IDENTIFICATION DIVISION.
PROGRAM-ID.  READ-CSV.
AUTHOR Hantzley Tauckoor.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
    SELECT INPUT-FILE
    ASSIGN TO "/nfs_dir/input/info.csv"
    ORGANIZATION IS LINE SEQUENTIAL
    ACCESS MODE IS SEQUENTIAL.
    
    SELECT OUTPUT-FILE
    ASSIGN TO "/nfs_dir/output/output.txt"
    ORGANIZATION IS LINE SEQUENTIAL
    ACCESS MODE IS SEQUENTIAL.
    
DATA DIVISION.
FILE SECTION.
FD  INPUT-FILE          RECORD CONTAINS 80 CHARACTERS.
01  INPUT-RECORD        PIC X(80).
FD  OUTPUT-FILE         RECORD CONTAINS 160 CHARACTERS.
01  OUTPUT-RECORD.
    05 OUT-LAST-NAME     PIC X(25).
    05 FILLER            PIC X(5).
    05 OUT-FIRST-NAME    PIC X(15).
    05 FILLER            PIC X(5).
    05 OUT-STREET        PIC X(30).
    05 FILLER            PIC X(5).
    05 OUT-CITY          PIC X(15).
    05 FILLER            PIC X(5).
    05 OUT-STATE         PIC XXX.
    05 FILLER            PIC X(5).
    05 OUT-ZIP           PIC X(10).
    05 FILLER            PIC X(38).
    
WORKING-STORAGE SECTION.
01  SEPARATE-IT.
    05 LAST_NAME        PIC X(25).
    05 FIRST_NAME       PIC X(15).
    05 STREET_ADDR      PIC X(30).
    05 CITY             PIC X(15).
    05 STATE            PIC XXX.
    05 ZIP              PIC X(10).
PROCEDURE DIVISION.
START-ROUTINE.
    OPEN INPUT INPUT-FILE.
    OPEN OUTPUT OUTPUT-FILE.
READ-ROUTINE.
    MOVE SPACES TO INPUT-RECORD.
    READ INPUT-FILE AT END GO TO END-ROUTINE.
    MOVE SPACES TO SEPARATE-IT.
    UNSTRING INPUT-RECORD DELIMITED BY ","
       INTO LAST_NAME, FIRST_NAME, STREET_ADDR,
       CITY, STATE, ZIP.
    MOVE SPACES TO OUTPUT-RECORD.
    MOVE LAST_NAME TO OUT-LAST-NAME.
    MOVE FIRST_NAME TO OUT-FIRST-NAME.
    MOVE STREET_ADDR TO OUT-STREET.
    MOVE CITY TO OUT-CITY.
    MOVE STATE TO OUT-STATE.
    MOVE ZIP TO OUT-ZIP.
    WRITE OUTPUT-RECORD.
    GO TO READ-ROUTINE.
END-ROUTINE.
    CLOSE INPUT-FILE.
    CLOSE OUTPUT-FILE.
    STOP RUN.

The output file:

Alejandro 	Rosalez 	123 Any Street 	 	Any Town 	NS 	1234
Akua 	 	Mansa 	 	234 Some Street 	Some Town 	VI 	2345
Carlos 	 	Salazar 	345 North Street 	Any Town 	WA 	9567
Nikhil 	 	Jayasha 	839 Some Crescent 	Some Town 	NT 	9023
Richard 	Roe 	 	456 Left Street 	Some Town 	IR 	8934
Kwesi 	 	Manu 	 	567 Right Avenue 	Any Town 	SA 	1030
Wang 	 	Xiulan 	 	678 Some Street 	Some Town 	QL 	7890

Solution overview

We are going to keep the code unchanged but wrap it up into a container on a Red Hat OpenShift on AWS (ROSA) cluster. The application is run as a cron job. These jobs regularly (i.e., every minute) select the input files from a specific directory on a shared filesystem, generate the output, and place the output into the shared filesystem in another directory. We use an Elastic File System (EFS) filesystem to store the input CSV and output files to make them accessible by programs that provide the input files.

The following diagram shows the solution architecture:

Diagram of system architecure

These are the steps that we take to implement this solution:

  1. The feasibility of running the COBOL code on Linux
  2. Containerizing the code
  3. Preparing the AWS environment (code and container repositories, shared filesystem, and the OpenShift cluster)
  4. Deploying the code on ROSA
  5. Test the application

In the following sections, we show how each of the above-mentioned steps is performed. For further information, see the Red Hat OpenShift Service on AWS (ROSA) documentation.

1. The feasibility of running COBOL on Linux

Running COBOL applications is not limited to the IBM operating systems. There are a few open source COBOL compilers available that can be easily compiled for Linux environments. IBM’s official COBOL compiler is also an option, but for this demo we are going to use GnuCOBOL. While GnuCOBOL can be installed from the source code, in some environments (e.g., Ubuntu), the compiler is installed via the package managers easily:

$ apt install gnucobol -y

$ dpkg -L gnucobol  | grep "cobc$"
/usr/bin/cobc

$ cobc -V
cobc (GnuCOBOL) 2.2.0
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Keisuke Nishida, Roger While, Ron Norman, Simon Sobisch, Edward Hart
Built     Jul 17 2018 20:29:40
Packaged  Sep 06 2017 18:48:43 UTC
C version "8.1.0"

Compile and run a test code:

$ cat test-program.cbl
    IDENTIFICATION DIVISION.
    PROGRAM-ID. test-program.

    PROCEDURE DIVISION.
    DisplayPrompt.
        DISPLAY "Hello World!".
        STOP RUN.

$ cobc -free -x test-program.cbl

$ ./test-program
Hello World!

With the COBOL compiler working on Linux, we have a similar environment on the Linux containers as well.

2. Containerization

COBOL is a compiled language, which means the program needs to be recompiled if the source code is changed. Depending on the use case, we might choose to include the COBOL compiler in the container image or not. If it is included, then the container dynamically builds the executable program from the source code. However, this add some latency to the runtime. Let’s discuss the pros and cons of each approach.

 Create a container image based on the compiled and executable version of the program:

Pro: Faster container creation. Once the container is created from the image, it will be ready to execute the code immediately because the code is already pre-compiled.

Con: A new container image needs to be created if the code is modified. In other words, a pipeline will be needed to automatically compile the program and build an updated version of the container image.

Create a container image which can compile the program:

Pro: Less complexity in terms of building the pipeline for the container image.

Con: While this might be acceptable for long-running containers, this approach adds unnecessary compilation overhead. In other words, anytime the container is executed, the code will be compiled and occurs even if it has not been modified.

In this post, we chose the second approach because of its simplicity.

The following steps explain how to create a container image capable of compiling and running COBOL programs:

Package the code into a Docker image:

To better understand how the Dockerfile has be constructed, let’s review the different pieces and elements first:

/nfs_dir: The shared filesystem to store the input CSV files in /nfs_dir/input/, and output reports in /nfs_dir/output/.

demo.cbl: The COBOL program that transforms the input file and saves it into the output file. We use the same code that is already running on mainframe.

batch-process.sh: A shell script which finds the input files in /nfs_dir/input directory and generates an output file for each one of them.

#!/usr/bin/bash

cobol_code="/home/demo.cbl"
shared_fs="/nfs_dir"
executable_program="${shared_fs}/executable_program"

if [ -f $cobol_code ]
then
        /usr/bin/cobc -free -x ${cobol_code} -o ${executable_program}
fi

if [ -f ${executable_program} ]
then
        input_dir=${shared_fs}/input
        output_dir=/nfs_dir/output
        ls $input_dir | while read file_to_process
        do
                ${executable_program} ${input_dir}/${file_to_process}
                rm -f ${input_dir}/${file_to_process}
                echo "${input_dir}/${file_to_process} has been processed."
        done
else
        echo "${executable_program} not found."
fi

echo "The batch job has been completed at `date`"

cobol-crob-job: A Linux crontab configuration file which allows batch-process.sh to be executed regularly every minute:

* * * * * /home/batch-process.sh >> /var/log/cron.log 2>&1
#

The following sample Dockerfile copies cobol-cron-job to the container. In the COPY sections, the necessary files are copied from the current directory in our workstation to the container image. In the RUN section, the required packages are installed and the cron mechanism is configured. And finally, the Command (CMD) section contains what executes when the container runs. It starts the cron daemon and prints the cron.log in the console:

Dockerfile:

FROM ubuntu:latest
COPY cobol-cron-job /etc/cron.d/cobol-cron-job
COPY batch-process.sh demo.cbl /home/
RUN apt-get update && apt-get install gnucobol cron -y && \
    chmod 0744 /etc/cron.d/cobol-cron-job && \
    chmod +x /home/batch-process.sh && \
    crontab /etc/cron.d/cobol-cron-job && \
    touch /var/log/cron.log 
CMD cron && tail -f /var/log/cron.log

Create the Docker image:

image="111122223333.dkr.ecr.ap-southeast-2.amazonaws.com/my-ecr-repo:cobol"
sudo docker image rm $image --force 2>/dev/null
sudo docker build --no-cache -t $image .

Test the Docker image locally:

# Delete the container if already exists
sudo docker rm `docker ps -a | grep test_container | awk '{print $1}'` --force 

# Create a new container
sudo docker run --name test_container $image &

# Wait for at least one cron job to be completed after a minute:
sleep 65 

# Check the cron log:
sudo docker exec -it test_container tail -f /var/log/cron.log

Here is sample output:

The batch job has been completed at Fri Jun  17 02:34:01 UTC 2022

Push the image to the repository. We will be using Amazon Elastic Container Registry (ECR):

# Login to ECR
aws ecr get-login-password --region ap-southeast-2 | sudo docker login --username AWS --password-stdin 111122223333.dkr.ecr.ap-southeast-2.amazonaws.com

# Upload the container image to ECR:
sudo docker push $image

3. Preparing an Amazon EFS file system for the cluster

Amazon EFS is a simple, serverless, elastic filesystem that makes it easy to set up, scale, and cost-optimize file storage. To get more information about Amazon EFS and how set it up from the AWS Management Console, please follow the step in this AWS blog post. In our scenario, Amazon EFS will be the shared Network File System (NFS) to store the inputs of the COBOL application that runs on ROSA. We also need to create an access point for the file system. According to the EFS documentation, Amazon EFS access points are application-specific entry points into an Amazon EFS file system that make it easier to manage application access to shared datasets.

Install the AWS EFS Operator on ROSA:

Make sure AWS EFS Operator has been installed on the cluster’s OperatorHub. This enables the ROSA cluster to understand how to interact with Amazon EFS. For more information about operators, please refer to the OpenShift documentation.

Image showing it was installed.

Create a shared volume in the ROSA cluster based on Amazon EFS:

To create an Amazon EFS-based shared volume through OpenShift’s EFS Operator, we need the file system and access point IDs from the Amazon EFS console or AWS Command Line Interface (CLI):

aws efs describe-file-systems | jq .FileSystems[].FileSystemId
"fs-37e7e20f"

aws efs describe-access-points |jq .AccessPoints[].AccessPointId
"fsap-053d9cb60c9c1eec1"

With the file system and access point IDs ready, we create a SharedVolume object in the cluster. For more information, please visit the ROSA documentation.

Create a SharedVolume:

$ cat <<EoF > efs-shared-volume1.yaml
apiVersion: aws-efs.managed.openshift.io/v1alpha1
kind: SharedVolume
metadata:
  name: efs-shared-volume1
  namespace: rosaproject1
spec:
  accessPointID: fsap-0fa06aa2d435e5750
  fileSystemID: fs-6728f5e7

EoF

oc create -f efs-shared-volume1.yaml
sharedvolume.aws-efs.managed.openshift.io/efs-shared-volume1 created

As a result, a PersistentVolumeClaim is created and mounted in the pods:

oc get pvc pvc-efs-shared-volume1
NAME                     STATUS   VOLUME                               CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc-efs-shared-volume1   Bound    pv-rosaproject1-efs-shared-volume1   1Gi        RWX            efs-sc         24h

4. Deploying the application on ROSA

The container image created is uploaded to a private repository in Amazon ECR. That is why we need to create a Secret in OpenShift to pull the image. Since the shared file system has also been created, that the last task is to define a pod and launch the application as part of it.

Don’t forget to modify the security group of the Amazon EFS mount target and allow the inbound access from the ROSA cluster. The worker nodes have a specific security group which can be allow-listed on the Amazon EFS side.

Create a Secret:

Save the Amazon ECR password to ~/.docker/config.json:

$ aws ecr get-login-password --region ap-southeast-2 | sudo docker login --username AWS --password-stdin 111122223333.dkr.ecr.ap-southeast-2.amazonaws.com

Create a generic password in OpenShift from the password file create above:

$ sudo cp /root/.docker/config.json /tmp; sudo chmod +r /tmp/config.json

$ oc create secret generic ecr-secret --from-file=.dockerconfigjson=/tmp/config.json --type=kubernetes.io/dockerconfigjson 

Now pods can pull images from ECR using the newly created secret:

$ oc get secret ecr-secret
NAME         TYPE                             DATA   AGE
ecr-secret   kubernetes.io/dockerconfigjson   1      6m6s

Create a pod that mounts the shared file system and uses the Amazon ECR-based secret to pull the image:

$ cat cobol-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: cobol-pod
spec:
  volumes:
  - name: efs-vol1
    persistentVolumeClaim:
      claimName: pvc-efs-shared-volume1


  imagePullSecrets:
  - name: ecr-secret

  containers:
  - name: cobol-container
    image: "111122223333.dkr.ecr.ap-southeast-2.amazonaws.com/my-ecr-repo:cobol"
    volumeMounts:
    - mountPath: "/nfs_dir"
      name: efs-vol1

$ oc create -f cobol-pod.yaml 
pod/cobol-pod created

$ oc get pods cobol-pod 
NAME        READY   STATUS    RESTARTS   AGE
cobol-pod   1/1     Running   0          15m

5. Test the application

Confirm that the Amazon EFS file system been mounted on the pod:

$ oc exec -it cobol-pod -- df -h | grep nfs
127.0.0.1:/     8.0E     0  8.0E   0% /nfs_dir

Confirm the cron job been configured properly on the pod:

oc exec -it cobol-pod -- crontab -l | grep batch
* * * * * /home/batch-process.sh >> /var/log/cron.log 2>&1

It’s time to test the functionality of the application. We just need to copy a CSV file into /nfs_dir/input, which resides on the Amazon EFS file system. The application that created above checks /nfs_dir/input every minute and it processes the CSV files that it finds.

What is the application doing:

There is currently no CSV file in the shared file system to be processed. Consequently, our application is just printing the current timestamp.

This is the log of the last three minutes:
$ oc exec -it cobol-pod -- tail -3 /var/log/cron.log
The batch job has been completed at Sat Jun 18 04:28:01 UTC 2022
The batch job has been completed at Sat Jun 18 04:29:01 UTC 2022
The batch job has been completed at Sat Jun 18 04:30:01 UTC 2022

Give the application a CSV file to process:

 A CSV file can be copied to Amazon EFS from any workstation or application that has mounted the file system. For simplicity we use oc cp:

# oc cp new-csv-file cobol-pod://nfs_dir/input

Poll the log file:

# oc exec -it cobol-pod -- tail -f /var/log/cron.log
The batch job has been completed at Tue Jan 18 04:38:01 UTC 2022
The batch job has been completed at Tue Jan 18 04:39:01 UTC 2022
/nfs_dir/input/new-csv-file has been processed.
The batch job has been completed at Tue Jan 18 04:40:01 UTC 2022

Prerequisites

To follow the procedures in this post, we recommend the following prerequisites:

  • A Linux/Mac workstation, or an AWS Cloud9 IDE.
  • A running Red Hat OpenShift cluster on AWS (ROSA). This post explains how to set up such an environment from scratch.

Cleanup

To avoid unexpected costs, please be mindful of cleaning up the resources that are no longer needed:

AWS CodeCommit and Amazon ECR: Use the AWS Management Console, or AWS CLI.

Conclusion

We have shown you how to re-platform and even augment a COBOL application using container technologies and Red Hat OpenShift on AWS. While this is not a solution to modernize every legacy application, especially those with tight integration with legacy middleware and data layers, this post offers new avenues to bring other application types to Amazon AWS with minimal to no change to the code.

If you are interested in trying this in your environment, we would like to encourage you take this ROSA workshop and get some hands-on experience first. If you are new to Amazon EFS, follow the Amazon EFS user guide. For any further assistance, please use the AWS Premium Support or AWS re:Post.

Mehdi Salehi

Mehdi Salehi

Mehdi is a Sr. Solutions Architect based in Sydney. He helps AWS partners with technical strategy and architectural guidance in order to bring value for the end customers. Outside work, you may find him somewhere in the wonderful nature of Australia.

Hantzley Tauckoor

Hantzley Tauckoor

Hantzley Tauckoor is an APJ Partner Solutions Architecture Leader based in Singapore. He has 20 years’ experience in the ICT industry spanning multiple functional areas, including solutions architecture, business development, sales strategy, consulting, and leadership. He leads a team of Senior Solutions Architects that enable partners to develop joint solutions, build technical capabilities, and steer them through the execution phase as customers migrate and modernize their applications to AWS. Outside work, he enjoys spending time with his family, watching movies, and hiking.