AWS Marketplace

How to install a self-serve data lake using TCS Connected Intelligence Data Lake

by guest author James Carden, Solution Architect, TCS

The TCS Connected Intelligence Data Lake for Business (CIDL) is a self-serve data lake platform that helps business and IT teams turn raw data into insights. In this blog post, I will describe the steps to subscribe to, install, and test CIDL.

Prerequisites

  1. Log in to your AWS account. If you don’t have an AWS account, set one up now.
  2. Navigate to the CIDL product page. You can do this by choosing the CIDL link or by searching for “TCS” on the AWS Marketplace home page.

Launching the product

  1. Choose Continue to Subscribe.
  2. Review the terms and conditions and choose Accept Terms. After your request is processed, you’re notified by email and with a banner on the product page.
  3. Choose Continue to Configuration.
  4. For Region, choose where you want to launch the CIDL application (currently, CIDL is only available in U.S. Regions), and then choose Continue to Launch.
  5. On the next page, for EC2 Instance Type, choose the instance you want. There are three recommended instance types on the Product Page. For this setup, I used the h1.8xlarge instance. For larger data lake implementations, you can select the h1.16xlarge instance. The d2.8xlarge instance type is provided to enable CIDL deployment in Regions that do not have h1 instances available.
  6. For VPC Settings, choose the VPC where you want to deploy the instance.
  7. For Subnet Settings, choose one of the available subnets. Make sure to choose a publicly accessible subnet so that your VPN clients can reach the application over the internet.
  8. In the Security Group Settings section, either select an existing security group or choose Create New Based on Seller Settings. If you choose to create a new security group, please do the following:
  • Choose a name for the security group that you’re creating and enter a description.
  • Review the list of open ports and protocols. Note: it is best practice to change the CIDR ranges from 0.0.0.0/0 to prevent public access.
  • Choose Save.
  1. For Key Pair Settings, select an existing key pair or create one by choosing Create a key pair in EC2. This key is installed on the EC2 instance, allowing you to have SSH access.
  2. Choose Launch.

Setting up the security group

The Inbound security group controls what external applications can talk to CIDL, so be sure of your settings. You want to make the application accessible to users and systems that must use the data in the data lake while limiting access to only the desired applications, ports, and IP addresses. The chart below is an example of the applications and respective ports for which I’ve enabled access, along with the role associated with each:

Application Port Role
SSH 22 Linux/System Admin
Database 5432 Database Admin
Apache Ambari 8080 Hadoop Admin
CIP-Portal 8443 CIP Admin/Project Team
DSS_API 9060 External Teams and Applications
DRILL UI 8047 Database Admin

Most of the applications for which I enabled access are only for users in admin roles, following the principle of least privilege. In this example, the only ports that require general access are 8443 for the CIP Portal (project teams) and 9060 to access the CIDL data services (external applications). It’s a good idea to strictly limit the IP addresses that can access these ports. Since it’s relatively easy to update your security settings from the EC2 console, I recommend limiting access initially. You can open access to new users when required.

Associating an Elastic IP address

As a best practice, associate an Elastic IP address with your EC2 instance. This way, the CIDL application is associated with a consistent IP address. This public IP serves as the access point for all external interactions. For more information, see Elastic IP Addresses in the Amazon EC2 User Guide for Linux Instances.

Setting up CIDL

Once you have subscribed to CIDL, you must launch your EC2 instance to complete the setup of the CIDL application. To do this, right-click on the CIDL instance on the EC2 console and select Connect. This shows how to connect to the EC2 instance.

Once the instance is initialized, follow these steps to complete the CIDL installation.

  1. Connect to your instance using a secure shell (SSH) utility, such as PuTTY. We assume that if you are reading this blog post, you have used a secure shell utility and understand the process. For more information, see Connect to your Linux Instance in the Amazon EC2 User Guide for Linux Instances. The SSH communicates over port 22; to allow access to the EC2 instance, your IP address must be added to the security group settings.

The first time you access the CIDL instance, you will must use the key pair that you previously associated with the instance and login as ec2-user. Once you have logged in as ec2-user, you can begin setting up CIDL.

  1. To start the installation, log in to the default CIDL admin account. CIDL comes with a default admin user called cipuser. Set a password for cipuser by issuing the following command:

sudo passwd cipuser

After issuing this command, you should see the following prompt:

Changing password for user cipuser.

New password:

Retype new password:

Once you set the new password, a successful response looks like this:

passwd: all authentication tokens updated successfully.

  1. Once the password for cipuser has been set, you must switch from ec2-user to cipuser by issuing the following command:

su - cipuser

The system now identifies you as cipuser, the default CIDL admin.

  1. Run the rest of the setup script by issuing the following command:

./setup-start.sh

It takes the setup script approximately 20–25 minutes to configure the application for first-time use. Once the script has finished running, you are returned to the command prompt.

  1. Set up sshd_config to require password authentication. To do this:
  • Open the sshd_config file in vi by issuing the following command:

sudo vi /etc/ssh/sshd_config

  • Scroll down to the line that says # PasswordAuthentication yes and remove the #, as shown in the second line in the following screenshot.

password authentication

  • Save the file by pressing the Escape key, entering the command :wq!, and then applying the changes by running the connect service sshd restart.

CIDL is now ready for use.

Testing your deployment

To test the application, open your web browser and navigate to the following URL:

https://<<EC2 instance-public-hostname>>:8443/CIP-Portal

You may receive certificate errors at this stage, since it’s a self-signed certificate. No action on your part is needed.

When you see the login screen shown below, your installation has completed successfully.

CIDL login page

The initial login credentials for CIDL are:

Login: Admin

Password: <<EC2 instance id>>

As with any application with password set by default, change your password after the first login.

Conclusion

In this blog post, I have shown you how to subscribe to and set up CIDL to enable access to a self-serve data lake for your users. Users can now explore their analytics use cases.

For more information on TCS Digital Software and Solutions group, view the TCS seller profile. To find out more or to subscribe to TCS Connected Intelligence Data Lake for Business, view the CIDL product listing in AWS Marketplace.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.