AWS Big Data Blog

Enable private access to Amazon Redshift from your client applications in another VPC

You can now use an Amazon Redshift-managed VPC endpoint (powered by AWS PrivateLink) to connect to your private Amazon Redshift cluster with the RA3-instance type within your virtual private cloud (VPC). With an Amazon Redshift-managed VPC endpoint, you can privately access your Amazon Redshift data warehouse within your VPC from your client applications in another VPC within the same AWS account, another AWS account, or running on-premises without using public IPs or requiring encrypted traffic to traverse the internet.

This post introduces AWS PrivateLink and Amazon Redshift-managed VPC endpoints and how you can access your private Amazon Redshift cluster in another VPC. We show you how to authorize access to create endpoints to your Amazon Redshift cluster from another account and create Amazon Redshift-managed VPC endpoints to your Amazon Redshift cluster.

Why do you need Amazon Redshift-managed VPC endpoints?

You might have your client applications running on a separate VPC, perhaps even in another AWS account, because a different organization owns your business intelligence (BI) or extract, transform, and load (ETL) tools. In some cases, these tools might be running on-premises, and you need to access the cluster without having to access the public internet.

AWS PrivateLink allows all network traffic between AWS services within the AWS network, and does so in a highly available and scalable manner. When you create an Amazon Redshift-managed VPC endpoint, these service endpoints appear as elastic network interfaces with a private IP address in your target VPC.

Before Amazon Redshift-managed VPC endpoint, you had to run your consumption workloads such as Amazon QuickSight dashboards on the same VPC as the cluster, as well as run the cluster in a public subnet, or deploy and manage a Network Load Balancer automating the target group to point to the active IP associated with the Amazon Redshift endpoint address in order to expose access to clients. The following diagram illustrates this architecture.

Now that Amazon Redshift supports cross-VPC access using Amazon Redshift-managed VPC endpoints, you can configure Amazon Redshift clusters to expose additional endpoints running on public or private subnets within the same VPC, different VPC, or different AWS accounts, which enables you to add an additional layer of security to access your clusters regardless of where they run, with no infrastructure to manage. The following diagram illustrates this updated architecture of using Amazon Redshift-managed VPC endpoints in the same VPC.

The following diagram shows the architecture of using Amazon Redshift-managed VPC endpoints on a different AWS account.

For this post, we discuss a use case in which an end-user such as a data engineer or data analyst uses an open-source SQL editor (SQL Workbench/J) to connect to a private cluster from a customer-facing subnet in another VPC.

Before Amazon Redshift-managed VPC endpoints

In our use case, before Amazon Redshift-managed VPC endpoints, the network admin monitored the state IP address for the Amazon Redshift leader node and updated the load balancers to correctly route to it in order to present the data analyst with a connection string to connect to their BI tool (see the following architectural diagram).

After Amazon Redshift-managed VPC endpoints

After Amazon Redshift-managed VPC endpoints were configured (see the following architectural diagram), the network admin no longer needs to manage the cluster or the load balancer, because this is managed by Amazon Redshift. The network admin can provide the connection string to the data analyst after the initial setup, and allow the data analyst to connect to their BI tool without further involvement from the network admin team.

Using this network architecture allows you to simplify the design, while increasing security by limiting the access to your private subnets and only allowing select clients through your endpoint.

Set up the solution

To implement this solution, you need to complete the following high-level steps:

  1. Enable cluster relocation.
  2. Authorize access to additional accounts (optional).
  3. Create target subnet groups.
  4. Create an Amazon Redshift-managed VPC endpoint.

The following sections walk you through how to configure these components to access an Amazon Redshift cluster from an Amazon Redshift-managed VPC endpoint in the same account, and also highlight optional steps if your clients reside on another AWS account.

Enable cluster relocation

Cluster relocation enables you to move a cluster to another Availability Zone without any loss of data or changes to your application.

You can enable cluster relocation through the AWS Management Console or the AWS Command Link Interface (AWS CLI) in two different ways: during cluster creation, or after launching the cluster. We walk you through both options, which you should complete from the cluster account.

Enable cluster relocation during cluster creation

If you’re creating a new cluster, complete the following steps:

  1. On the Amazon Redshift console, while creating the cluster, disable the Use defaults option under Additional configurations.

This exposes a set of options to override default behaviors.

  1. In the Backup section, for Cluster relocation, select Enable.

You can also enable cluster relocation during cluster creation through the AWS CLI API using the following commands:

aws redshift create-cluster --cluster-identifier mycluster --number-of-nodes 2 --master-username adminuser --master-user-password TopSecret1 --node-type ra3.4xlarge --port 5439 —no-publicly-accessible --availability-zone-relocation

Enable cluster relocation on an existing cluster

If you’re modifying an existing cluster, complete the following steps:

  1. On the Amazon Redshift console, choose the cluster.
  2. On the Backup tab, choose Edit.
  3. In the edit window, enable cluster relocation.

You can also enable cluster relocation through the AWS CLI API:

aws redshift modify-cluster --cluster-identifier mycluster --availability-zone-relocation

Authorize access to additional AWS accounts (optional)

If you want to allow additional AWS accounts to create cluster endpoints on, this section walks through the steps required to authorize access. If your endpoints reside in the same account as the cluster, you can skip this section because the same account is authorized by default.

  1. On the Amazon Redshift console, choose the cluster.
  2. On the Properties tab, in the Granted accounts section, choose Grant access.
  3. For AWS Account ID, enter the target AWS account ID.
  4. Select Grant access only to specific VPCs.
  5. For Virtual private cloud (VPC), you can choose to restrict access to specific VPCs or to the entire account.
  6. Choose Grant access.

You have now authorized your cluster to deploy endpoints in additional accounts with the option to specify target VPCs.

You can also allow additional accounts to deploy cluster endpoints through the AWS CLI API. To grant access to all the VPCs in the target account, enter the following code:

aws redshift authorize-endpoint-access --cluster-identifier mycluster --account <target_account>

To grant access to a specific VPC in the target account, enter the following code:

aws redshift authorize-endpoint-access --cluster-identifier mycluster --account <target_account> --vpc-ids <vpc_id>

Create target subnet groups

After you authorize access, you need to define subnet groups in the target account under which the endpoint should be deployed on.

  1. In the target account, on the Amazon Redshift console, choose Configurations.
  2. Choose Subnet groups and choose Create subnet group.
  3. For Name, enter a name for your subnet group.
  4. In the Subnets section, specify the target VPC and choose the appropriate subnets to deploy the endpoint against.

For this post, we use two public subnets on a second VPC, but you can enter subnets as appropriate to your use case.

  1. Choose Create cluster subnet group.

You can also create a subnet group through the AWS CLI API:

aws redshift create-cluster-subnet-group --cluster-subnet-group-name mysubnetgroup --description "My subnet group" --subnet-ids <subnet_id>

Create an Amazon Redshift-managed VPC endpoint for Amazon Redshift

You’re now ready to create the endpoint for the Amazon Redshift cluster.

  1. In the cluster account, on the Amazon Redshift console, choose Configurations.
  2. Choose Create endpoint.
  3. In the Endpoint settings section, specify the target account and choose the appropriate VPC and subnet group to deploy the endpoint against.

As part of this step, you must provide a security group to use as a part of your endpoint. This is the critical step in which you can define a secure endpoint to limit what ports, protocols, and sources for inbound traffic you’re authorizing into your endpoint. The common practice is to allow port 5439 (Amazon Redshift connectivity port) to the security group or CIDR range in which your consumption workloads run.

  1. Choose Create endpoint.

You can also create the Amazon Redshift-managed VPC endpoint through the AWS CLI API:

aws redshift create-endpoint-access --cluster-identifier mycluster --resource-owner <account> --endpoint-name myendpoint --subnet-group-name mysubnetgroup --vpc-security-group-ids <security_group_id>

Use the Amazon Redshift-managed endpoint

After you create the endpoint, you can see your endpoint on the Configurations page on the Amazon Redshift console. Choosing the endpoint displays the endpoint information and different connection strings based on the consumption strategy.

You can also describe the Amazon Redshift-managed VPC endpoint through the AWS CLI API:

aws redshift describe-endpoint-access --endpoint-name myendpoint

Following our use case, we use an Amazon Elastic Compute Cloud (Amazon EC2) instance running SQL Workbench/J on our target account, which our data analysts use to query Amazon Redshift securely. We can deploy the endpoint under multiple network topologies; we provide some common examples in this section.

Cross-VPC access for internet-based workloads without VPC peering

In this scenario, the data analysts access a workspace located in our target account over the internet, where they can start SQL Workbench/J or an equivalent application running in the public subnet. The Amazon Redshift-managed VPC endpoint for the Amazon Redshift cluster is deployed to the same VPC but on the private subnet, and the cluster is deployed to the private subnet of the cluster account. The following diagram illustrates this architecture.

Cross-VPC access for internet-based workloads with VPC peering

In this scenario, the data analysts access a workspace located in our target account over the internet, where they can start SQL Workbench/J or an equivalent solution deployed in the public subnet. Unlike the previous example, the Amazon Redshift-managed VPC endpoint for the Amazon Redshift cluster is deployed in the public subnet of the same VPC as the Amazon Redshift cluster, which requires the target account and cluster account to be peered in order to expose routes between them. The cluster is deployed to the private subnet of the cluster account. The following diagram illustrates this architecture.

Cross-VPC access for on-premises workstations

In this scenario, the data analysts access a workspace located on-premises that has SQL Workbench/J or an equivalent tool deployed to. Their networking team needs to configure routing between their on-premises network and AWS, commonly done through AWS Direct Connect and AWS Transit Gateway, which can resolve traffic to the public subnet of the cluster account that contains the Amazon Redshift-managed VPC endpoint for the Amazon Redshift cluster. The cluster is deployed to the private subnet of the same account (see the following architectural diagram).

Configure cross-VPC access

For this post, we demonstrate how to configure the first scenario—cross-VPC access for internet-based workloads without VPC peering. To achieve this, we complete the following steps:

  1. Configure the route table.
  2. Configure the security group for our EC2 instance.
  3. Deploy and test the connection.

Configure the route table

Depending on how you choose to deploy your endpoint and clients, you may need to make changes to your route table to allow traffic between the networks. For some cases, such as when the endpoint resides on the same VPC as the SQL Workbench/J client, no additional changes are needed because the route is local to your VPC. In others, such as with internet-based workloads with VPC peering, you may need to make additional changes, such as allowing routes to traverse through the peered connection. In cases where you need to communicate to on-premises networks, you may need to configure your route tables to send traffic through an AWS Transit Gateway adapter. For more information about these different configurations, see Example routing options.

Configure a security group for the EC2 instance

Next, we create a security group that we assign to our EC2 instance to deploy SQL Workbench/J on and access it from our workstation. Depending on your specific use case, several options are available, such as the following:

For this post, we present a simple solution by exposing the availability to RDP from our current IP address.

  1. In the target account, on the Amazon VPC console, choose Security groups.
  2. Choose Create security group.
  3. Provide the relevant information and choose the VPC you configured previously.
  4. Add a new inbound role and choose RDP as the protocol.
  5. Enter your IP address as the source.

If you’re not sure what your IP address is, you can search “what is my IP” in your preferred search engine to get a result with your public IP address.

Deploy and test the connection

Finally, we can launch a Windows instance to deploy our visual editor.

  1. On the Amazon EC2 console, choose Launch instance.
  2. Choose the relevant Windows AMI provided by your organization (for this post, we use the Microsoft Windows Server 2019 Base image provided by Amazon and the t2.large size).
  3. Choose the VPC you previously configured and the target subnet where your users access the environment.
  4. Choose the security group you created in the previous step and a private key to launch.
  5. When the instance is running, retrieve the Windows password and connect to it. For instructions, see Connect to your Windows instance.
  6. When you’re connected, download your visual editor and drivers.

For this post, we connect to our cluster with SQL Workbench/J. Be sure to append the database name at the end of your Amazon Redshift-managed VPC endpoint connection (for this post, we use /dev).

At this point, you can connect and run queries securely against your Amazon Redshift cluster using your Amazon Redshift-managed VPC endpoint.

Conclusion

In this post, we walked through reference access patterns that are now simplified to add an additional layer of security to access to your private Amazon Redshift clusters from clients running either on another VPC on the same account, a different VPC on another account, or even on-premises. Amazon Redshift-managed VPC endpoint not only offers the ability to expose managed endpoints to access resources on different subnets, but also provides an additional security enforcement point to limit access to your cluster to only known access patterns.

If you have any questions, please leave a comment.


About the Authors

Bernie Herdan is a Global Accounts Solutions Architect with AWS Global Financial Services based in New York. He works with large financial institutions to help them build secure and scalable solutions on AWS, to drive innovation and achieve their business outcomes.

 

 

 

Debu Panda, a senior product manager at AWS, is an industry leader in analytics, application platform, and database technologies and has more than 20 years of experience in the IT world.

 

 

 

Patrick Huang is a senior software engineer for Amazon Redshift, where he leads and builds cutting-edge features for the Redshift cloud infrastructure. He is passionate about learning new technologies and innovating for the Redshift service. He holds a B.A. in Computer Science from the University of California, Berkeley. Outside of work, he enjoys playing basketball.

 

 

Saravanaraj Velusamy is a Senior Software Engineer at Amazon Redshift, where he works on building next generation features for Redshift. More recently his work focuses on the areas at the intersection of security, networking and databases. Outside of work, he likes to read and reflect on teachings from ancient Greek and Indian schools of philosophy, play frisbee and practice yoga.

 

 

 

Prasanna Sridharan is a Senior Data & Analytics Architect with AWS. He is passionate about building the right big data solution for the AWS customers. He is specialized in the design and implementation of Analytics, Data Management and Big Data systems, mainly for Enterprise and FSI customers.