Networking & Content Delivery

Private network for data movement in generative AI

Private network for data movement in generative AI

In this post, we cover the architecture patterns for building secure, private network connectivity for data movement in generative artificial intelligence (generative AI) using Amazon Web Services (AWS) and AWS Partner Network (APN) services.

Data privacy and security are top of mind for customers exploring generative AI initiatives. AWS provides services that give you control over your data and meet the data privacy and security requirements. For example, use AWS Identity and Access Management (IAM) to manage inference access, deny access for specific models, and enable console access services. You can use AWS service managed keys and customer managed keys for accessing fine-tuned custom models. Monitor API activity and troubleshoot issues as you build solutions using AWS CloudTrail and protect against malicious traffic using AWS WAF. You can use AWS PrivateLink, which provides secure, private IP connectivity through VPC endpoints for data access during training, fine-tuning, and inferencing with generative AI applications.

In this post, we dive into how AWS PrivateLink can help with secure networking for Retrieval Augmented Generation (RAG) based generative AI inferencing use cases.

Requirements for secure networking in generative AI

You want to avoid sending sensitive data such as personal identifiable information (PII) data over the public internet and maintain secure data access for training, fine-tuning, and inferencing or querying foundation models (FMs), large language models (LLMs) during both testing and production. Reduce surface area for malicious attacks on generative AI applications that use a large volume of intellectual and proprietary data. Equally important is end-to-end private IP network connections for data movement between AWS and your on-premises infrastructure to prevent triggering legal and compliance exposure and violating regulatory requirements such as HIPAA, HITRUST, and PCI DSS.

AWS PrivateLink for secure networking in generative AI

One way to address secure private IP networking for generative AI is by creating logically isolated virtual private clouds (VPCs) without internet gateway and using AWS PrivateLink for all data movement in the generative AI pipeline, including data used for training, fine-tuning, and inferencing or querying. By creating VPC interface endpoints, powered by AWS PrivateLink, you can securely access AWS services and third-party independent software vendor (ISV) software-as-a-service (SaaS) services that support PrivateLink endpoint service.

Additionally, you can enable private connectivity for your generative AI SaaS application by creating a PrivateLink endpoint service. When you invoke the private endpoint for an AWS or ISV SaaS service, the service hostname is resolved to the interface endpoint created in your VPC, and traffic flows through a private IP network over the AWS backbone. AWS handles the DNS resolution for PrivateLink in the background without requiring you to set up a separate private hosted zone in your VPC.

In the subsequent sections, we provide best practices guidelines on how generative AI SaaS providers can use AWS PrivateLink and the security controls to build secure networking for RAG-based generative AI inferencing workloads. The guidelines also encompass secure usage of data for RAG that relies on retrieving additional proprietary data from the generative AI SaaS provider’s data sources, such as data warehouses, Kafka clusters, Amazon OpenSearch Service clusters, and Amazon Relational Database Service (Amazon RDS) for reducing hallucination and improving accuracy in inferencing. All data traffic uses private IP addresses and remains within your AWS private network. Secure networking for generative AI extends end-to-end private secure connectivity between services in AWS and on-premises infrastructure for hybrid generative AI workloads.

Reference architecture guidelines for secure private connection for RAG

RAG requires vector embeddings for your proprietary data stored in a vector data store along with your original data. RAG converts the inferencing query to a vector that is used to search for semantically similar embeddings in the vector data store and then retrieves original data for matching embeddings to augment the response generation. There are two options for deploying RAG:

The secure RAG implementation utilizes private connections for all data movement related to retrieval and generation. In the following sections, we provide reference architecture guidelines for implementing secure RAG in AWS, specifically for the generative AI inferencing use case.

Option #1: Secure RAG through vector data stores

This architecture provides secure networking guidelines for RAG in generative AI by explicitly invoking AWS or third-party vector data stores for retrieval. The retrieved data is then used for inferencing with FMs available through Amazon Bedrock, Amazon SageMaker, or third-party model providers. The vector data stores and FMs are invoked using VPC interface endpoints. Amazon Bedrock and SageMaker support PrivateLink connectivity. For vector data stores from providers such as DataStax, MongoDB, and Snowflake, and for FMs from ISVs such as H2o.ai, the responsibility lies with the service provider to integrate a PrivateLink endpoint service and enable the service consumer to connect to their data stores and FMs through VPC interface endpoints.

First, here are the high-level requirements to enable the following architecture, with links to detailed descriptions for implementing each component.

As the generative AI SaaS provider, you will:

  • Create a vector data store in AWS or a third-party party service provider.
  • Create VPC interface endpoints in your VPC to access the vector data store provider over a private IP network.
  • Create VPC interface endpoints in your VPC to access Amazon Bedrock, SageMaker, or a third-party FM provider.
  • Additionally, with AWS services that support private endpoints and endpoint policies, you can create a VPC endpoint policy and attach it to the VPC interface endpoint to restrict access to specific principals and for performing specific actions on AWS services.
  • Set up a PrivateLink endpoint service for your generative AI SaaS application and share the endpoint service DNS with the client. For details on sharing your application over PrivateLink, please refer to the AWS PrivateLink documentation.
  • Optionally, you can also enable a private DNS, allowing your clients to connect using the well-known vanity DNS name for your generative AI SaaS service and not have your clients rewrite their application code.
  • You can use permissions and acceptance settings on your endpoint service to further restrict access to your application to specific clients (AWS principals).

As the client, you will:

Once you have created logically isolated VPCs without any internet path, you can use the additional security controls available with PrivateLink to further improve your security posture. You can use an IAM identity policy to grant users permission to work with VPC interface endpoints, such as creation or deletion. You can also refine this further by restricting VPC interface endpoint creation based on the service owner (Amazon, ISV, account ID), or on the service private DNS name. Once a VPC interface endpoint is created, you can define additional traffic flow controls using security groups for VPC endpoints, restricting the traffic that is allowed over the VPC interface endpoint.

The traffic flow can be broken down into two parts. The first part is from client to generative AI SaaS provider when a client sends an inferencing query request from either the on-premises network or from within their VPC in AWS to the generative AI SaaS application supporting PrivateLink endpoint service. The second part is from application to the vector data store and FM services.

To help illustrate, let’s first look at the flow from client to application traffic . In this example architecture, we have the client using VPC interface endpoints in their VPC to connect to the generative AI SaaS provider PrivateLink endpoint service.

Figure 1 shows an example architecture for secure private connectivity between the client and the generative AI SaaS provider.

Secure private connectivity client and generative AI SaaS provider

Figure 1: Private connectivity between the client and the generative AI SaaS provider

1: The client sends an inferencing query to the application from either their VPC in AWS or from the on-premises ifrastructure through the VPC interface endpoints in their VPC. The query is evaluated against security group policies attached to the interface endpoints.

2: The inferencing query is sent through PrivateLink to the application. The application could be in the same account as the client initiating the query or in a different account and different AWS organization. The assumption in this post is that the client and generative AI SaaS provider are in different AWS organizations.

In the next section, we walk through the traffic flow over a secure, private connection between you as the generative AI SaaS provider and the services that you invoke in the backend.

Figure 2 shows an example architecture for secure private connectivity between generative AI SaaS provider and AWS and ISV services in the backend.

Secure private connectivity GenAI SaaS provider and AWS and ISV services backend

Figure 2: Private connectivity between generative AI SaaS provider and services in the backend

3: The application sends the vectorized query over the VPC interface endpoints to the vector database provider SaaS or AWS service.

4: The vector database provider SaaS or AWS service retrieves additional context from the generative AI SaaS provider’s vector data store and responds to the application.

(Note steps 5–7 in the diagram show two options as examples: AWS generative AI services such as Amazon Bedrock and SageMaker, and a third-party FM provider service).

5: The application sends both the query and retrieved context from vector databases to the FMs over the VPC interface endpoints for inferencing. Depending on whether the FM is available through Amazon Bedrock and Amazon SageMaker services or a third-party FM provider service in AWS, the application invokes the corresponding API to send the request to the FM.

6: The FM processes the query and additional context and generates the intelligent response.

7: The intelligent response is returned to the application.

Figure 3 shows the architecture diagram that consolidates all the preceding steps for secure private connectivity for RAG using AWS and ISV vector data stores.

Consolidated front end backend RAG on AWS and ISV

Figure 3: Secure RAG using AWS and ISV services

 

8: The application returns the intelligent response to the client that originated the inferencing request.

9: The originating client receives the inferred intelligent response.

Option #2: Secure RAG through Knowledge Bases for Amazon Bedrock

In this section, we show the architecture pattern for generative AI inferencing using the VPC interface endpoints to access vector data stores integrated with Knowledge Bases for Amazon Bedrock. At the time of this writing, private clusters in Amazon Aurora Serverless, Amazon OpenSearch Serverless, and MongoDB Atlas are integrated with Knowledge Bases for Amazon Bedrock. For an up-to-date list of private clusters for vector data stores integrated with Knowledge Bases for Amazon Bedrock, see the Amazon Bedrock User Guide.

The difference with Option #1 is in how the generative AI SaaS application provider uses Knowledge Bases for Amazon Bedrock for RAG. Let’s briefly cover how a generative AI application will interact with Knowledge Bases for Amazon Bedrock. The RetrieveAndGenerate API integrates RAG and FM query processing into a single operation. This integration simplifies RAG implementation by eliminating separate retrieval and generation steps. By invoking RetrieveAndGenerate API through VPC interface endpoints, you can maintain secure private connectivity for RAG.

First, here are the high-level requirements to enable the architecture in Figure 4, with links to detailed descriptions on implementing each component.

As the generative AI SaaS provider, you will:

Details on creating knowledge bases for each client or isolating client-specific source data in the knowledge base is outside the scope of this post. Subsequent posts on the AWS Networking & Content Delivery Blog will cover this topic.

As the client:

  • There are no changes to the client setup. Follow the detailed description from Option #1.

Figure 4 shows an example architecture for secure private connectivity for RAG while using Knowledge Bases for Amazon Bedrock.

Secure RAG using Knowledge Bases for Amazon Bedrock

Figure 4: Secure RAG using Knowledge Bases for Amazon Bedrock

The traffic from the client to the application in steps 1 and 2 is identical to what we saw in Option #1. However, for the generative AI SaaS provider, the traffic flow changes slightly.

3: The application invokes the RetrieveAndGenerate API and sends the query over VPC interface endpoints to Knowledge Bases for Amazon Bedrock. To encrypt this session using AWS Key Management Service (AWS KMS) keys, refer to Encryption of knowledge base retrieval.

4 and 5: Knowledge Bases for Amazon Bedrock accesses the vector data store over PrivateLink to search using the query and retrieve additional context.

6: Knowledge Bases for Amazon Bedrock then invokes the FM through Amazon Bedrock over PrivateLink and generates the intelligent response.

7: The intelligent response is returned to the application.

From the application to the client, the traffic flow is identical to steps 8 and 9, described in Option #1.

Conclusion

Take advantage of the reference architecture guidelines for secure RAG, build private network connectivity for data movement, and accelerate your generative AI transformation in AWS. For a walkthrough of how to deploy PrivateLink endpoints for Amazon Bedrock, see post Use AWS PrivateLink to set up private access to Amazon Bedrock. To learn more about how to build a multi-tenant SaaS with tenant-specific metering and billing, see post Build an internal SaaS service with cost and usage tracking for foundation models on Amazon Bedrock.