Build secure network architectures for generative AI applications using AWS services

As generative AI becomes foundational across industries—powering everything from conversational agents to real-time media synthesis—it simultaneously creates new opportunities for bad actors to exploit. The complex architectures behind generative AI applications expose a large surface area including public-facing APIs, inference services, custom web applications, and integrations with cloud infrastructure. These systems are not immune to classic or emerging external threats. We have introduced a series of posts on securing generative AI, starting with Securing generative AI: An introduction to the Generative AI Security Scoping Matrix, which establishes a model for the risk and security implications based on the type of generative AI workload you are deploying and lays the foundation for the rest of our series.

This post continues the series, and provides guidance on how to build secure, scalable network architectures for generative AI applications on Amazon Web Services (AWS) through a defense-in-depth approach. You’ll learn how to protect your AI workloads while maintaining performance and reliability. We cover multiple security layers including virtual private cloud (VPC) isolation, network firewalls, application protection, and edge security controls that you can use to create a comprehensive defense strategy for generative AI workloads.

Common generative AI external threats

In this section, we review some of the most common external threats facing generative AI applications today.

Network level DDoS attacks (layer 4)

Network level distributed denial-of-service (DDoS) or volumetric attacks such as SYN floods, UDP floods, and ICMP floods, target the network layer by sending a flood of layer 4 requests to a server. The aim is to exhaust the server’s resources by initiating multiple half-open layer 4 connections, ultimately rendering the system unresponsive to legitimate users. For generative AI applications, which often require sustained sessions and low-latency responses, such exploits can severely disrupt availability and user experience. Another type of volumetric attack is reflection attacks, where threat actors exploit services such as DNS to amplify the volume of traffic sent to a target. A small request sent to a vulnerable third-party server is reflected and expanded into a large response directed at the victim. This technique is particularly dangerous when generative AI APIs are exposed to the public internet, because it can flood the endpoints with unexpected traffic, causing service degradation.

Web request flood (layer 7)

These sophisticated exploits on layer 7 mimic legitimate traffic patterns to evade traditional security filters. By overwhelming application endpoints with excessive HTTP requests, bad actors can cause compute exhaustion, especially in inference-heavy AI workloads. Unlike volumetric DDoS, these requests are often hard to distinguish from real users, making mitigation more complex.

Application-specific exploits

Bad actors increasingly focus on exploiting vulnerabilities in application-specific code or the systems on which the code runs—such as Apache, Nginx, or Tomcat. For generative AI applications, which often involve custom APIs and orchestration layers, even a small misconfiguration or unpatched component can open the door to unauthorized access, data leakage, or system compromise.

SQL injection

By injecting malicious SQL code through input fields or query parameters, bad actors can manipulate backend databases to exfiltrate or corrupt data. Generative AI apps that log prompts or store user interactions are especially susceptible if input sanitization is not enforced rigorously.

Cross-site scripting

Cross-site scripting (XSS) attacks involve injecting malicious scripts into trusted web pages. When unsuspecting users interact with these scripts, bad actors can hijack sessions, steal data, or redirect users to malicious sites. Frontend interfaces for AI services, especially dashboards or prompt consoles, are particularly vulnerable.

OWASP top application security risks

The OWASP Top 10 serves as a critical framework for identifying common security risks in web applications. These include issues such as broken access control, security misconfigurations, and insufficient logging and monitoring. Generative AI solutions must adhere to OWASP guidelines to mitigate the broader landscape of web application threats.

Common vulnerabilities and exposures

Security professionals must remain vigilant to known common vulnerabilities and exposures (CVEs) impacting AI stack components—ranging from open-source libraries to model-serving infrastructure. Ignoring CVEs can lead to exploits that compromise sensitive model outputs, internal APIs, or user data.

Malicious bots and crawlers

Malicious bots increasingly target AI applications to scrape content such as generated text, pricing data, proprietary models, or images behind paywalls. These bots can masquerade as legitimate crawlers or scanners but are designed to harvest content at scale, potentially violating terms of service and impacting infrastructure costs.

Content scrapers and probing tools

Automated tools that crawl, scrape, or scan generative AI systems are often used for competitive intelligence, model inversion, or discovering exposed endpoints. These tools can weaken privacy guarantees and expose AI behavior to unintended third parties.

Securing your generative AI applications

Here are some of the common strategies that you can use to help secure your generative AI applications using AWS services.

Private networking with Amazon Bedrock

Amazon Bedrock is a fully managed service provided by AWS that offers developers access to foundation models (FMs) and the tools to customize them for specific applications. Developers can use it to build and scale generative AI applications using FMs through an API, without managing infrastructure. A typical set of environments is shown in Figure 1. It has the following network components:

The Amazon Bedrock service accounts, which hold the service components and exposes its API endpoint within the same AWS Region as the customer’s account.
The customer’s AWS account, from which the application needs to use Amazon Bedrock and invokes the Amazon Bedrock API with the query request.
The customer’s corporate network within the existing data center, which is external to the AWS global network, and holds the customer’s application that also needs to use Amazon Bedrock and can involve the Amazon Bedrock API request. AWS Direct Connect provides a dedicated network connection between an on-premises network and AWS, bypassing the public internet.

Figure 1 – Private networking architecture with Amazon Bedrock

You can use AWS PrivateLink to establish private connectivity between the FMs and the generative AI applications running in on-premises networks or your Amazon Virtual Private Cloud (Amazon VPC), without exposing your traffic to the public internet. In the case of Amazon VPC, the application running on the private subnet instance invokes the Amazon Bedrock API call. The API call is routed to the Amazon Bedrock VPC endpoint that is associated to the VPC endpoint policy and then to Amazon Bedrock APIs. The Amazon Bedrock service API endpoint receives the API request over PrivateLink without traversing the public internet. You also have the option of connecting to the Amazon Bedrock service API through the NAT Gateway. Note that in this case, the traffic goes over the AWS network backbone without being exposed to the public internet.

You can also privately access Amazon Bedrock APIs over the VPC endpoint from your corporate network through an AWS Direct Connect gateway. In case you don’t have Direct Connect, you can connect to the Amazon Bedrock service API over public internet (shown by the lower arrow in figure 1). In each of these cases, traffic to the API endpoint for Amazon Bedrock is encrypted in flight using TLS 1.2 or later, and traffic within the Amazon Bedrock service is also encrypted in flight to at least this standard. Customer content processed by Amazon Bedrock is encrypted and stored at rest in the Region where you are using Amazon Bedrock.

Minimize layer 7 generative AI threats with AWS WAF

As generative AI systems become integral to content creation, customer service, and decision-making processes, they are increasingly targeted by malicious bot threats. These exploits can distort outputs, flood models with biased or harmful training data (data poisoning), exploit vulnerabilities for prompt injection, or overwhelm systems through automated abuse. The consequences include degraded model performance, spread of misinformation, compromised data privacy, and erosion of user trust. To mitigate these threats, safeguards such as user authentication, input validation, anomaly detection, and continuous monitoring must be embedded into generative AI pipelines. AWS WAF is a web application firewall that helps protect applications (OSI Layer 7) from bot exploits by using intelligent detection and rule-based defenses. Its Bot Control feature identifies and filters out harmful bots while allowing legitimate ones. Through rate limiting, custom rules, and anomaly detection, AWS WAF can block scraping, credential stuffing, and distributed denial-of-service attempts (DDoS). Anti-DDoS rule group—targeted specifically at automatic mitigation of application exploits that involve HTTP request floods—is available as a Managed Rules group through AWS WAF. It removes the complexity associated with managing various AWS WAF rules and ACLs to handle these increasingly agile threats.

AWS WAF can be enabled on Amazon CloudFront, Amazon API Gateway, Application Load Balancer (ALB) and is deployed alongside these services (Figure 2). These AWS services terminate the TCP/TLS connection, process incoming HTTP requests, and then forward the request to AWS WAF for inspection and filtering. There is no need for reverse proxy, DNS setup, or TLS certification.

Figure 2 – Architecture using AWS WAF to minimize layer 7 generative AI threats

Mitigate DDoS at the edge for generative AI applications

DDoS attacks pose a serious threat to generative AI applications by overwhelming servers with massive traffic, leading to latency, degraded performance, or complete outages. Because generative AI workloads are often resource-intensive and operate in real time (for example, chatbots, image generators, and coding assistants), even brief disruptions can impact user experience and trust. Moreover, DDoS attacks can be used as a smokescreen for other exploits, such as data exfiltration or prompt injection. Protecting generative AI systems with scalable defenses such as rate limiting, traffic filtering, and auto-scaling infrastructure is crucial to help maintain availability and service continuity.

AWS Shield safeguards generative AI applications from DDoS attacks by providing always-on detection and automated mitigation. The standard tier, AWS Shield Standard, defends against common volumetric and state-exhaustion attacks with no additional cost. For advanced protection, AWS Shield Advanced offers real-time threat intelligence, adaptive rate limiting, and 24/7 access to the AWS Shield Response Team (SRT). To use the services of the SRT, you must be subscribed to the Business Support plan or the Enterprise Support plan. This helps makes sure that generative AI services—often reliant on high availability and low latency—remain resilient under threat, maintaining performance and uptime even during large-scale traffic surges. Integration with services like Amazon CloudFront and Elastic Load Balancing further enhances scalability and protection (Figure 3).

Figure 3 – Help protect your applications from DDoS attack by using AWS Shield Advance at the edge

Perimeter firewall for generative AI applications

AWS Network Firewall is a managed network security service that you can use to deploy stateful and stateless packet inspection, intrusion prevention (IPS), and domain filtering capabilities directly into your Amazon VPCs. It helps inspect and filter both inbound and outbound traffic at the subnet level. For generative AI applications, this means enforcing fine-grained traffic controls without the complexity of managing your own appliances or proxies. You can use AWS Network Firewall to create custom stateless or stateful rules to block specific payloads, known signatures, or unusual traffic patterns. In multi-model or multi-tenant environments, the firewall can help enforce east-west segmentation, so that a compromised microservice cannot laterally access other AI components or sensitive services. Network Firewall can also be effective in collecting hostnames of the specific sites that are being accessed by your generative AI application. This process is called egress filtering and is specifically helpful in case an adversary compromises the generative AI workload and tries to establish a connection to an external command and control system. Network Firewall can be used to help secure outbound traffic by blocking packets that fail to meet certain security requirements.

Monitor for malicious activity

Monitoring for malicious activity is essential to protect generative AI applications from evolving security threats. These applications process unpredictable user inputs and generate dynamic outputs, making them particularly vulnerable to exploitation. Continuous monitoring enables early detection of unusual traffic patterns, excessive API usage, or anomalous input behavior, symptoms which might indicate potential exploits. It also helps prevent misuse of AI models through prompt injection, adversarial inputs, or attempts to extract sensitive information from model responses. In addition, monitoring plays a critical role in identifying DDoS attempts and resource abuse, which could otherwise disrupt the availability of AI services. By observing and analyzing real-time activity, organizations can take proactive steps to block malicious actors, adjust security controls, and maintain the integrity and reliability of their generative AI applications. Amazon GuardDuty, a threat detection service, continuously analyzes AWS account activity, network flow logs, and DNS queries to uncover potential compromises or malicious behaviors targeting your environment. GuardDuty identifies suspicious activity such as AWS credential exfiltration and suspicious user API usage in Amazon SageMaker APIs. Additionally, GuardDuty offers protection plans for Amazon Simple Storage Service (Amazon S3), Amazon Relational Database Service (Amazon RDS), Amazon Elastic Kubernetes Service (Amazon EKS), EKS Runtime Monitoring, Runtime Monitoring for Amazon ECS and Amazon EC2, Malware Protection for Amazon EC2 and S3, and AWS Lambda Protection. Amazon Inspector is an automated vulnerability management service that continually scans AWS workloads for software vulnerabilities and unintended network exposure. Amazon Detective simplifies the investigative process and helps security teams conduct faster and more effective forensic investigations.

Network defense in depth for generative AI

Like other modern applications, a defense-in-depth approach is recommended when designing network architectures for generative AI applications. A complete reference architecture of a generative AI application showing defense in depth protection using AWS services is shown in Figure 4.

Figure 4 – Workflow for generative AI network defense in depth

The workflow shown in Figure 4 is as follows:

A client makes a request to your application. DNS directs the client to a CloudFront location, where AWS WAF and Shield are deployed.
CloudFront sends the request through an AWS WAF rule to determine whether to block, monitor, or allow the traffic. Shield can mitigate a wide range of known DDoS attack vectors and zero-day attack vectors. Depending on the configuration, Shield Advanced and AWS WAF work together to rate-limit traffic coming from individual IP addresses. If AWS WAF or Shield Advanced don’t block the traffic, the services will send it to the CloudFront routing rules.
CloudFront sends the traffic to the ALB. However, before reaching the ALB, the traffic is inspected through a Network Firewall endpoint. Network Firewall supports deep packet inspection to decrypt, inspect, and re-encrypt inbound and outbound TLS traffic destined for the Internet, another VPC, or another subnet to help protect data. You can limit access to threat actors at this stage with additional safeguards. If you are not expecting traffic from high risk countries, it is advisable to restrict access through geographic blocking or you could at least put a strict rate limit for those countries where you don’t expect traffic through AWS WAF rules on ingress and Network Firewall on egress.

Note: If you use Amazon CloudFront geographic restrictions to block a country’s access to your content, then CloudFront blocks every request from that country. CloudFront doesn’t forward the requests to AWS WAF. To use AWS WAF criteria to allow or block requests based on geography, use an AWS WAF geographic match rule statement instead.
The ALB is in a public subnet. To keep the instances that run your app isolated from the rest of the world using the ALB, you can additionally, help protect from common layer 7 exploits with AWS WAF.
The ALB has target groups in the form of instances that are running the generative AI application running in a private subnet. You can help protect the instances and their network interfaces with the foundational VPC constructs like security groups, network ACLs (NACLs), and segmentation.
The application calls the Amazon Bedrock API. You can use PrivateLink to create a private connection between your VPC and Amazon Bedrock. You can then access Amazon Bedrock as if it were in your VPC, without the use of an internet gateway, NAT device, VPN connection, or Direct Connect connection. Instances in your VPC don’t need public IP addresses to access Amazon Bedrock. You establish this private connection by creating an interface endpoint, powered by PrivateLink. You create an endpoint network interface in each subnet that you enable for the interface endpoint. These are requester-managed network interfaces that serve as the entry point for traffic destined for Amazon Bedrock.
Create an interface endpoint for Amazon Bedrock using either the Amazon VPC console or the AWS Command Line Interface (AWS CLI). Create an interface endpoint for Amazon Bedrock using the following service name: com.amazonaws.region.bedrock-runtime
Create an endpoint policy for your interface endpoint. An endpoint policy is an AWS Identity and Access Management (IAM) resource that you can attach to an interface endpoint. The default endpoint policy allows full access to Amazon Bedrock through the interface endpoint. To control the access allowed to Amazon Bedrock from your VPC, attach a custom endpoint policy to the interface endpoint. An example of a custom endpoint policy is shown in Figure 4. When you attach this policy to your interface endpoint, it grants access to the listed Amazon Bedrock actions for all principals on all resources.
This solution uses Amazon CloudWatch to collect operational metrics from various services to generate custom dashboards that you can use to monitor the deployment’s performance and operational health.
The return flow of the traffic traverses the same path in reverse direction.

Conclusion

In this post, we reviewed the secure network design principles that provide a robust foundation for deploying generative AI applications on AWS while maintaining strong security controls. By implementing the patterns described in this post, you can confidently use AI capabilities while protecting sensitive data and infrastructure.

Want to dive deeper into additional areas of generative AI security? Check out the other posts in the Securing generative AI series:

Part 1 – Securing generative AI: An introduction to the generative AI Security Scoping Matrix
Part 2 – Designing generative AI workloads for resilience
Part 3 – Securing generative AI: Applying relevant security controls
Part 4 – Securing generative AI: data, compliance, and privacy considerations
Part 5 – Build secure network architectures for generative AI applications using AWS services (this post)

AWS Security Blog