AWS Big Data Blog
Enable private access to Amazon Redshift from your client applications in another VPC
November 2023: This post was reviewed and updated to include configurations and options for Amazon Redshift Serverless.
You can now use an Amazon Redshift-managed VPC endpoint (powered by AWS PrivateLink) to connect to your private Amazon Redshift cluster with the RA3-instance type or Amazon Redshift Serverless within your virtual private cloud (VPC). With an Amazon Redshift-managed VPC endpoint, you can privately access your Amazon Redshift data warehouse within your VPC from your client applications in another VPC within the same AWS account, another AWS account (in case of provisioned cluster), or running on-premises without using public IPs or requiring encrypted traffic to traverse the internet.
This post introduces AWS PrivateLink and Amazon Redshift-managed VPC endpoints and how you can access your private Amazon Redshift cluster in another VPC. We show you how to authorize access to create endpoints to your Amazon Redshift cluster from another VPC in same account and create Amazon Redshift-managed VPC endpoints to your Amazon Redshift cluster.
Why do you need Amazon Redshift-managed VPC endpoints?
You might have your client applications running on a separate VPC, perhaps even in another AWS account, because a different organization owns your business intelligence (BI) or extract, transform, and load (ETL) tools. In some cases, these tools might be running on-premises, and you need to access the cluster without having to access the public internet.
AWS PrivateLink allows all network traffic between AWS services within the AWS network, and does so in a highly available and scalable manner. When you create an Amazon Redshift-managed VPC endpoint, these service endpoints appear as elastic network interfaces with a private IP address in your target VPC.
Before Amazon Redshift-managed VPC endpoint, you had to run your consumption workloads such as Amazon QuickSight dashboards on the same VPC as the cluster, as well as run the cluster in a public subnet, or deploy and manage a Network Load Balancer automating the target group to point to the active IP associated with the Amazon Redshift endpoint address in order to expose access to clients. The following diagram illustrates this architecture.
Now that Amazon Redshift supports cross-VPC access using Amazon Redshift-managed VPC endpoints, you can configure Amazon Redshift clusters to expose additional endpoints running on public or private subnets within the same VPC, different VPC, or different AWS accounts, which enables you to add an additional layer of security to access your clusters regardless of where they run, with no infrastructure to manage. The following diagram illustrates this updated architecture of using Amazon Redshift-managed VPC endpoints in the same VPC.
The following diagram shows the architecture of using Amazon Redshift-managed VPC endpoints on a different AWS account (applicable in case of provisioned Redshift).
For this post, we discuss a use case in which an end-user such as a data engineer or data analyst uses an open-source SQL editor (SQL Workbench/J) to connect to a private cluster from a customer-facing subnet in another VPC.
Before Amazon Redshift-managed VPC endpoints
In our use case, before Amazon Redshift-managed VPC endpoints, the network admin monitored the state IP address for the Amazon Redshift leader node and updated the load balancers to correctly route to it in order to present the data analyst with a connection string to connect to their BI tool (see the following architectural diagram).
After Amazon Redshift-managed VPC endpoints
After Amazon Redshift-managed VPC endpoints were configured (see the following architectural diagram), the network admin no longer needs to manage the cluster or the load balancer, because this is managed by Amazon Redshift. The network admin can provide the connection string to the data analyst after the initial setup, and allow the data analyst to connect to their BI tool without further involvement from the network admin team.
Using this network architecture allows you to simplify the design, while increasing security by limiting the access to your private subnets and only allowing select clients through your endpoint.
Set up the solution
To implement this solution, you need to complete the following high-level steps:
- Enable cluster relocation. (Skip this step for Serverless)
- Authorize access to additional accounts (optional). (Skip this step is not required for Serverless as cross account-cross VPC is not supported yet.)
- Create target subnet groups. (Skip this step for Serverless as subnets can be selected while creating workgroup)
- Create an Amazon Redshift-managed VPC endpoint.
The following sections walk you through how to configure these components to access an Amazon Redshift cluster from an Amazon Redshift-managed VPC endpoint in the same account, and also highlight optional steps if your clients reside on another AWS account (in case of provisioned cluster).
Enable cluster relocation
Cluster relocation enables you to move a cluster to another Availability Zone without any loss of data or changes to your application.
You can enable cluster relocation through the AWS Management Console or the AWS Command Link Interface (AWS CLI) in two different ways: during cluster creation, or after launching the cluster. We walk you through both options, which you should complete from the cluster account.
Enable cluster relocation during cluster creation
If you’re creating a new cluster, complete the following steps:
- On the Amazon Redshift console, while creating the cluster, disable the Use defaults option under Additional configurations.
This exposes a set of options to override default behaviors.
- In the Backup section, for Cluster relocation, select Enable.
You can also enable cluster relocation during cluster creation through the AWS CLI API using the following commands:
Enable cluster relocation on an existing cluster
If you’re modifying an existing cluster, complete the following steps:
- On the Amazon Redshift console, choose the cluster.
- On the Backup tab, choose Edit.
- In the edit window, enable cluster relocation.
You can also enable cluster relocation through the AWS CLI API:
Authorize access to additional AWS accounts (optional)
If you want to allow additional AWS accounts to create cluster endpoints on, this section walks through the steps required to authorize access. If your endpoints reside in the same account as the cluster, you can skip this section because the same account is authorized by default.
- On the Amazon Redshift console, choose the cluster.
- On the Properties tab, in the Granted accounts section, choose Grant access.
- For AWS Account ID, enter the target AWS account ID.
- Select Grant access only to specific VPCs.
- For Virtual private cloud (VPC), you can choose to restrict access to specific VPCs or to the entire account.
- Choose Grant access.
You have now authorized your cluster to deploy endpoints in additional accounts with the option to specify target VPCs.
You can also allow additional accounts to deploy cluster endpoints through the AWS CLI API. To grant access to all the VPCs in the target account, enter the following code:
To grant access to a specific VPC in the target account, enter the following code:
Create target subnet groups
After you authorize access, you need to define subnet groups in the target account under which the endpoint should be deployed on.
- In the target account, on the Amazon Redshift console, choose Configurations.
- Choose Subnet groups and choose Create subnet group.
- For Name, enter a name for your subnet group.
- In the Subnets section, specify the target VPC and choose the appropriate subnets to deploy the endpoint against.
For this post, we use two public subnets on a second VPC, but you can enter subnets as appropriate to your use case.
- Choose Create cluster subnet group.
You can also create a subnet group through the AWS CLI API:
Create an Amazon Redshift-managed VPC endpoint for Amazon Redshift
You’re now ready to create the endpoint for the Amazon Redshift cluster.
For Redshift Provisioned, follow the instructions below:
- In the target account, on the Amazon Redshift console, choose Configurations.
- Choose Create endpoint.
- In the Endpoint settings section, specify the target account and choose the appropriate VPC and subnet group to deploy the endpoint against.
As part of this step, you must provide a security group to use as a part of your endpoint. This is the critical step in which you can define a secure endpoint to limit what ports, protocols, and sources for inbound traffic you’re authorizing into your endpoint. The common practice is to allow port 5439 (Amazon Redshift connectivity port) to the security group or CIDR range in which your consumption workloads run.
- Choose Create endpoint.
You can also create the Amazon Redshift-managed VPC endpoint through the AWS CLI API:
For Redshift Serverless, follow the instructions below:
- First, create a security group to associate with the Redshift-managed VPC endpoint.
The common practice is to allow port 5439 (Amazon Redshift connectivity port) to the security group or CIDR range in which your consumption workloads run (Redshift).
In the security group associated with Redshift cluster, add an inbound rule with Type as Redshift
, Protocol as TCP
, Port range as 5439
(Amazon Redshift connectivity port) and Source as the CIDR range in which your consumption workloads run.
- On the Amazon Redshift console of workgroup, go to Redshift-managed VPC endpoints.
- Choose Create endpoint
- In the Endpoint settings section, choose the appropriate VPC, associated private subnet and security group created above to deploy the endpoint against.
This is a critical step in which you can define a secure endpoint to limit what ports, protocols, and sources for inbound traffic you are authorizing into your endpoint.
- Choose Create endpoint.
Use the Amazon Redshift-managed endpoint
After you create the endpoint, you can see your endpoint on the Configurations page on the Amazon Redshift console. Choosing the endpoint displays the endpoint information and different connection strings based on the consumption strategy.
For provisioned:
You can also describe the Amazon Redshift-managed VPC endpoint through the AWS CLI API:
For serverless,
Following our use case, we use an Amazon Elastic Compute Cloud (Amazon EC2) instance running SQL Workbench/J on our target account, which our data analysts use to query Amazon Redshift securely. We can deploy the endpoint under multiple network topologies; we provide some common examples in this section.
Cross-VPC access for internet-based workloads without VPC peering
In this scenario, the data analysts access a workspace located in our target account over the internet, where they can start SQL Workbench/J or an equivalent application running in the public subnet. The Amazon Redshift-managed VPC endpoint for the Amazon Redshift cluster is deployed to the same VPC but on the private subnet, and the cluster is deployed to the private subnet of the cluster account. The following diagram illustrates this architecture.
Cross-VPC access for internet-based workloads with VPC peering
In this scenario, the data analysts access a workspace located in our target account over the internet, where they can start SQL Workbench/J or an equivalent solution deployed in the public subnet. Unlike the previous example, the Amazon Redshift-managed VPC endpoint for the Amazon Redshift cluster is deployed in the public subnet of the same VPC as the Amazon Redshift cluster, which requires the target account and cluster account to be peered in order to expose routes between them. The cluster is deployed to the private subnet of the cluster account. The following diagram illustrates this architecture.
Cross-VPC access for on-premises workstations
In this scenario, the data analysts access a workspace located on-premises that has SQL Workbench/J or an equivalent tool deployed to. Their networking team needs to configure routing between their on-premises network and AWS, commonly done through AWS Direct Connect and AWS Transit Gateway, which can resolve traffic to the public subnet of the cluster account that contains the Amazon Redshift-managed VPC endpoint for the Amazon Redshift cluster. The cluster is deployed to the private subnet of the same account (see the following architectural diagram).
Configure cross-VPC access
For this post, we demonstrate how to configure the first scenario—cross-VPC access for internet-based workloads without VPC peering. To achieve this, we complete the following steps:
- Configure the route table.
- Associate the security group, used for Redshift managed VPC endpoint, to the EC2 instance.
- Deploy and test the connection.
Configure the route table
Depending on how you choose to deploy your endpoint and clients, you may need to make changes to your route table to allow traffic between the networks. For some cases, such as when the endpoint resides on the same VPC as the SQL Workbench/J client, no additional changes are needed because the route is local to your VPC. In others, such as with internet-based workloads with VPC peering, you may need to make additional changes, such as allowing routes to traverse through the peered connection. In cases where you need to communicate to on-premises networks, you may need to configure your route tables to send traffic through an AWS Transit Gateway adapter. For more information about these different configurations, see Example routing options.
Configure a security group for the EC2 instance
Next, we attach the same security group, which is also associated with Redshift-managed VPC endpoint, to our EC2 instance to deploy SQL Workbench/J on and access it from our workstation. Depending on your specific use case, several options are available, such as the following:
- Connect using EC2 Instance Connect
- Start a session using AWS Systems Manager, Amazon EC2, or the AWS CLI
- Use Remote Desktop Gateway on AWS
For this post, we present a simple solution by exposing the availability to RDP from our current IP address.
In the EC2 security group, add a new inbound role and choose Type as RDP
, Protocol as TCP
, Port range as 3389
and Source as your IP address.
If you’re not sure what your IP address is, you can search “what is my IP” in your preferred search engine to get a result with your public IP address.
Deploy and test the connection
Finally, we can launch a Windows instance to deploy our visual editor.
- On the Amazon EC2 console, choose Launch instance.
- Choose the relevant Windows AMI provided by your organization (for this post, we use the Microsoft Windows Server 2019 Base image provided by Amazon and the t2.large size).
- Choose the VPC you previously configured and the target subnet where your users access the environment.
- Choose the security group you created in the previous step and a private key to launch.
- When the instance is running, retrieve the Windows password and connect to it. For instructions, see Connect to your Windows instance.
- When you’re connected, download your visual editor and drivers.
For this post, we connect to our cluster with SQL Workbench/J. Be sure to use the JDBC URL of the endpoint and append the database name at the end of your Amazon Redshift-managed VPC endpoint connection (for this post, we use /dev
).
Test the connection to ensure the connectivity.
For Provisioned:
For serverless:
At this point, you can connect and run queries securely against your Amazon Redshift cluster using your Amazon Redshift-managed VPC endpoint.
Conclusion
In this post, we walked through reference access patterns that are now simplified to add an additional layer of security to access to your private Amazon Redshift clusters (for provisioned and serverless) from clients running either on another VPC on the same account, a different VPC on another account, or even on-premises. Amazon Redshift-managed VPC endpoint not only offers the ability to expose managed endpoints to access resources on different subnets, but also provides an additional security enforcement point to limit access to your cluster to only known access patterns.
If you have any questions, please leave a comment.
About the Authors
Bernie Herdan is a Global Accounts Solutions Architect with AWS Global Financial Services based in New York. He works with large financial institutions to help them build secure and scalable solutions on AWS, to drive innovation and achieve their business outcomes.
Debu Panda, a senior product manager at AWS, is an industry leader in analytics, application platform, and database technologies and has more than 20 years of experience in the IT world.
Patrick Huang is a senior software engineer for Amazon Redshift, where he leads and builds cutting-edge features for the Redshift cloud infrastructure. He is passionate about learning new technologies and innovating for the Redshift service. He holds a B.A. in Computer Science from the University of California, Berkeley. Outside of work, he enjoys playing basketball.
Saravanaraj Velusamy is a Senior Software Engineer at Amazon Redshift, where he works on building next generation features for Redshift. More recently his work focuses on the areas at the intersection of security, networking and databases. Outside of work, he likes to read and reflect on teachings from ancient Greek and Indian schools of philosophy, play frisbee and practice yoga.
Prasanna Sridharan is a Senior Data & Analytics Architect with AWS. He is passionate about building the right big data solution for the AWS customers. He is specialized in the design and implementation of Analytics, Data Management and Big Data systems, mainly for Enterprise and FSI customers.
Radhika Jakkula> is a Big Data Prototyping Solutions Architect at AWS. She helps customers build prototypes using AWS analytics services and purpose-built databases. She is a specialist in assessing wide range of requirements and applying relevant AWS native services, Big Data tools and frameworks to create a robust architecture.