AWS for Industries
Deploy Agentic Bidding Without Sacrificing Speed: ARTF Containers with NVIDIA GPU Acceleration on AWS
AWS is building the infrastructure for programmatic advertising’s shift to agentic AI where autonomous agents plan campaigns, orchestrate models, and optimize bids across the full funnel. Today, the bidstream processes billions of decisions daily, each within milliseconds, relying on rule-based heuristics and lightweight models constrained by real-time latency budgets and CPU-only infrastructure. That constraint is ending. Agentic architecture introduces capabilities the bidstream has never had: memory, planning, and the ability to act across time rather than inside a single auction.
AWS is bringing cloud infrastructure, foundation models, and NVIDIA GPU-accelerated computing together into a single stack for advertising technology (ad tech). It meets the industry where it is today and scales with it toward an agentic future.
The Guidance for Accelerator-Optimized Agentic Bidding on AWS is a production-ready reference implementation that brings NVIDIA GPU-accelerated deep-learning inference into the programmatic bidding pipeline. Leveraging NVIDIA Triton for real-time inference and the IAB Tech Lab’s Agentic Real-Time Framework (ARTF), the solution lets demand-side platforms (DSPs) and supply-side platforms (SSPs) run GPU-accelerated containerized AI agents in the auction path, where the bidstream lives. The result is lower latency, fewer data hops, and stronger data protection. DSPs, ISVs, and SSPs can deploy containerized AI agent services within the bidding pipeline, delivering model-driven decisions at the same speed, or faster, than traditional programmatic implementations.
What is ARTF
The Agentic Real-Time Framework (ARTF) is an open industry standard published by the IAB Tech Lab. It defines how AI-powered containers participate in real-time bidding. ARTF containers receive bid requests, run inference, and propose typed mutations (structured changes such as adjusting a bid price, activating an audience segment, deal filtering, or adding a quality score) to the bidstream. The host platform (DSP, bidder, or ad platform) reviews and applies approved mutations before the auction continues.
ARTF replaces monolithic bidding logic with modular AI microservices. Each container handles a specific bidding decision, powered by a purpose-built model optimized for that task. Containers receive bid requests, run inference workloads, and return structured outputs such as bid price adjustments, audience activations, and deal scores, before the host platform continues processing. This allows for composable bidding intelligence, where teams can deploy, update, or improve individual models without disrupting the rest of the stack.
Guidance for Accelerator-Optimized Agentic Bidding
The Guidance demonstrates four ARTF-compliant containers. The ARTF framework supports unbounded use cases; these serve as starting points. Three of the containers run industry-standard deep learning recommender models, GPU-accelerated and served through NVIDIA Triton Inference Server on Amazon Elastic Kubernetes Service (EKS), and the fourth is a CPU, rule-based metrics enricher.
Bid price optimization: A Deep Learning Recommendation Model (DLRM) predicts click-through rate from user, site, and device features, then computes an optimal shaded bid price. Advertisers spend less per impression while maintaining competitive win rates, improving return on ad spend.
Audience segment activation: A Wide & Deep neural network scores user-segment affinities by combining memorization of known high-value patterns with generalization to unseen feature combinations. Advertisers reach their highest-value audiences at every impression without relying on static segment lists.
Private marketplace deal management: Neural Collaborative Filtering (NCF) predicts user-deal relevance, autonomously activating high-affinity deals and suppressing poor matches. Campaign managers can scale private marketplace strategies across thousands of deals without increasing operational overhead.
Quality metrics enrichment: A rule-based container adds viewability and brand safety scores, giving bidders richer signal to avoid low-quality inventory and protect brand reputation. This container also demonstrates that ARTF’s modular architecture supports both ML and deterministic logic in the same pipeline.
Figure 1: Architecture diagram showing the Guidance for Accelerator-Optimized Agentic Bidding on AWS, including ARTF-compliant containers, NVIDIA Triton Inference Server, Amazon Bedrock AgentCore MCP integration, and the orchestration layer connecting bid optimization, audience segmentation, deal management, and metrics enrichment services.
Why GPU acceleration matters for bidding
Agentic containers add processing steps to the bidstream. The IAB Tech Lab’s ARTF gives inference a defined place to live inside the auction itself, but these deep learning models require GPU acceleration to run within real-time auction latency constraints. Without GPUs, these additional inference steps push response time beyond auction deadlines.
NVIDIA Triton Inference Server on Amazon EKS with NVIDIA addresses this. Triton’s dynamic batching groups concurrent requests to maximize GPU throughput, while multi-model serving runs all three neural networks on a single GPU instance. This delivers deep learning inference that fits within programmatic latency budgets at a cost structure that scales with demand. The latest-generation NVIDIA GPUs on AWS (including EC2 G7e with Blackwell architecture), provide additional memory capacity and throughput for more demanding workloads within real-time advertising environments. Running GPU workloads on AWS converts capital expenditure into flexible, consumption-based pricing, provides access to the latest NVIDIA GPU instances without procurement lead times, and enables ad tech teams to iterate at the speed their market requires.
The agentic dimension
Each ARTF container in the solution exposes a standard agent interface (Model Context Protocol) –the same protocol that AI systems use to invoke tools. This means the inference layer being built today can be called by agents with memory, goals, and campaign context, not just by auction requests. The services deployed for programmatic can also become tools an orchestrated agent calls when programmatic is one capability among many.
Through Amazon Bedrock AgentCore and MCP integration, advertisers and platform teams can test how ARTF agent services, such as bid shading, audience activation, and deal-management containers, respond to different bid request scenarios. For example, before increasing bids for a private marketplace deal, a team could test sample bid requests to see whether the deal-management container would activate the deal and whether the bid-shading container would propose a price adjustment. Media buyers can review the proposed bidstream updates before applying those services in production.
This architecture also supports closed-loop learning, so bidding models can improve over time through governed retraining workflows. Bidding outcomes feed into model retraining workflows, with NVIDIA NeMo-RL supporting reinforcement learning and campaign-level bidding optimization based on auction outcome data. Once a Model Governance agent validates performance through A/B testing and approves a new model version, compatible model artifacts can be optimized with NVIDIA TensorRT for low-latency inference. NVIDIA NIM can complement the Triton inference path by providing GPU-accelerated inference microservices for approved models where applicable, while the governance and deployment workflow manages rollout. Two feedback loops operate in parallel: a batch retraining loop that improves prediction accuracy across click-through rate, conversion, and bid optimization, and an agentic loop where a Bid Shading Strategy agent refines pricing parameters based on win rates and competitive dynamics. The result is a system that can improve future model behavior over time, without adding latency to real-time bidding.
Accelerating the stack
NVIDIA provides the GPU-accelerated inference infrastructure through NVIDIA Triton Inference Server for the DLRM, Wide & Deep, and NCF models used in this solution. The AWS-NVIDIA collaboration extends beyond the current recommender-style inference path to future closed-loop learning workflows with NVIDIA NeMo-RL and NVIDIA NIM. These workflows can support reinforcement learning, policy optimization, and optimized inference deployment for advertising models and bidding strategies based on campaign performance signals.
The complete solution is open source and available on GitHub.
For engineering teams evaluating the solution, the repository includes:
- Four ARTF-compliant containers
- Pre-trained NVIDIA models ready for inference with optimized serving configuration
- Auto-scaling infrastructure templates for production deployment
- Amazon Bedrock AgentCore MCP server or conversational scenario simulation
- A testing frontend for validating mutations against sample bid requests
- A single-command deployment script that provisions the entire stack on AWS
- A local development environment for testing
Advertising technology providers can deploy the solution as-is for evaluation or use it as a reference for integrating their own proprietary models into the ARTF ecosystem. One partner already extending ARTF’s capabilities is Bridge, which brings deterministic identity resolution directly into the agentic bidding pipeline.
Bridge: Deterministic identity for agentic decisions
Bridge is the deterministic identity layer for agentic advertising. Within the reference implementation, Bridge provides the identity anchor that audience-targeting containers rely on to resolve real users. The company resolves who’s actually on the other end of an impression, identifying a real, verified, consented person rather than a probabilistic guess, and makes that identity, and the signals around it, available right inside ARTF. Because Bridge runs where the decisions happen, there’s no waiting on an outside service mid-auction and no raw user data ever leaves the pipeline. The result is targeting that agents and programmatic systems can actually trust.
“As advertising gets more automated, the question underneath it all stays the same: do you actually know who you’re talking to? Our job is to make sure the answer is yes. Every identity your agents act on should be deterministic: real, verified, and consented, not a guess stitched together from fragments. With Bridge ARTF Agents in this Guidance, agentic systems a source of truth they can trust at the moment of decision, so the media they buy is more accurate, more responsive, and delivers a stronger return.” – Robert Rose, President & CEO, Bridge
What’s next
Part 2 of the guidance will feature closed-loop learning based on bidding outcomes, enabling models to continuously improve from real auction results, along with additional ISV container examples that expand the ecosystem of ready-to-deploy agentic components. Beyond the next release, as ARTF adoption grows more broadly, we expect:
- DSPs deploying proprietary models as ARTF containers, creating a modular ecosystem where bidding intelligence can be composed and optimized independently
- AI agents that orchestrate multiple ARTF containers to simulate campaign outcomes before committing budget
- Convergence of ARTF (real-time execution) with IAB Tech Lab’s AAMP (Agentic Advertising Management Protocols) strategic orchestration to enable autonomous campaign management
AWS is building the foundation for this future. The reference implementation targets ARTF, but the underlying GPU-accelerated infrastructure applies to ad tech use cases today: real-time audience scoring, creative optimization, campaign pacing, and attribution.
Ready to deploy agentic bidding intelligence? Contact the AWS AdTech Solutions team to get early access to the reference implementation, schedule a technical deep dive, or discuss how ARTF containers with hardware acceleration can transform your programmatic strategy.