Artificial Agency advances always-on AI gameplay with AWS and Llama

Come ti è sembrato il contenuto?

Artificial Agency builds a behavior engine that enables agentic AI in games. The engine powers intelligent characters and systems that can reason about player intent and respond naturally over long play sessions. Bringing this to life at scale requires AI that can run in real time, across millions of concurrent players. To meet that bar, Artificial Agency is working with Meta Llama models and Amazon Web Services (AWS) to fine-tune small, game-specific models that deliver strong performance at a fraction of the size of larger models. This approach reduces infrastructure demands, lowers latency, and opens new creative possibilities for game studios.

Bringing intelligence to the game world

Most game characters are on a set path. They have scripts, behavior trees, and finite decision paths that are carefully designed to handle what players might do, but only what designers anticipated. What they can't do is improvise, adapt, or genuinely respond to a player's intent in real time. That gap is what Artificial Agency was built to close. The company develops a behavior engine that brings agentic AI into games, where it powers intelligent NPCs, companions, and dynamic systems.

“With traditional systems, you're limited to scripted interactions or predefined behaviors,” says Alex Kearney, co-founder and head of machine learning at Artificial Agency. “What we're enabling is something more dynamic—agents that can understand what a player means, not just what they say, and act in a way that feels intentional.” A game director system might create new encounters on the fly based on how the player is behaving. If a player asks for a pencil, a character might walk to a desk, pick one up, and bring it back. A companion might notice a player's health dropping and step in without being told.

The system boasts in-engine tools that integrate directly into Unity, Unreal, and other game engines and a cloud platform that manages agent state, memory, and how a character's understanding evolves over time. The goal, Kearney explains, is straightforward: “Game studios want to bring AI into gameplay in a way that feels natural to players, but they don't want to become machine learning experts to do it. What we're building is a way for developers to create intelligent characters and systems without needing to manage all that complexity themselves.”

Delivering this in a live game, however, presents constraints that differ from other AI applications. Games require real-time performance across long play sessions, strict latency budgets, and infrastructure capable of supporting millions of concurrent players on hardware that's already been pushed to its limits.

From frontier models to game-specific AI

Artificial Agency's approach starts with frontier-scale models and uses game-specific data to train smaller, specialized models tailored to each game's needs. “Large models can do impressive things, but they come with constraints. If you want AI to be always on in a game, you need something smaller, more efficient, and tailored to that environment.” A 70-billion parameter model knows ancient history, astrophysics, and how to bake a cake. Any given game might need some of that, but not all of it, and not at the resource cost that comes with it. That's why Artificial Agency is working to fine-tune Llama models across 1B, 3B, and 8B variants against a 70B performance baseline. These open-source models offer customizability, long-term stability, and something that matters deeply in creative work: ownership. A model built to a studio's artistic vision stays that way without surprise updates or model drift that quietly changes how characters behave over time.

In the early stages of this training, Artificial Agency followed a staged process. Supervised fine-tuning (SFT) came first, dramatically improving base model reliability. Reinforcement learning using GRPO (Group Relative Policy Optimization) then targeted out-of-the-box behavioral challenges like looping. Models occasionally repeated themselves, got stuck in patterns, and failed to drive narratives forward. In emergent gameplay with loosened guardrails, that breaks immersion fast. Additional datasets simulated challenging scenarios, and a purpose-built evaluation suite measured behavior in realistic game

contexts. “After fine-tuning,” says Dhruv Mullick, senior machine learning engineer at Artificial Agency, “we were able to significantly improve reliability and get the model to a point where it could run within the game system.”

AWS has been a central partner throughout. Artificial Agency uses Amazon SageMaker on Amazon Elastic Compute Cloud (Amazon EC2) infrastructure to train, evaluate, and iterate at scale. The team is also working with the AWS Generative AI Innovation Center (GenAIIC), which provides advisory support and proof-of-concept guidance, including training strategies, model configurations, and large-scale training approaches. AWS teams work directly alongside Artificial Agency to refine reward functions, debug issues, and accelerate iteration. “AWS has been a strong partner in helping us iterate through this process,” Kearney says, “testing different training approaches, refining how we evaluate behavior, and working through the challenges of getting these models to perform in a game environment.” The AWS team also implemented distributed training architectures including sequence parallelism to handle long input contexts, alongside a multi-GPU setup for both training and inference.

Proving smaller can be smarter

Early results showed meaningful progress across the metrics that matter most in a game context. In a proof of concept, supervised fine-tuning reduced parse error rates from approximately 78.7 percent to around 1 percent. Pass rates on key gameplay scenarios increased from roughly 6 percent to about 51 percent, and looping behaviors were cut by approximately half. As Mullick emphasizes, those numbers are only meaningful because of what they represent: “We're not just measuring abstract benchmarks. We're evaluating how the model behaves in real gameplay scenarios. That includes things like whether it avoids looping, whether it can adapt to emergent player-driven gameplay, and whether it maintains coherence over time.”

Internal testing also showed the fine-tuned smaller models moving substantially closer to the 70B baseline in game-specific performance. In other words, a 1B model—though 70 times smaller—is beginning to perform like one that dwarfs it when trained on the right data for the right context. Smaller models mean lower infrastructure demands, lower latency, and the ability to deploy AI across more of a game without cost becoming a creative constraint. In addition, says Kearney, “A model at the one- billion parameter level can run on consumer hardware. That's a meaningful shift, because it means you're no longer dependent on large-scale infrastructure for every interaction.” On-device inference changes what's possible architecturally, economically, and creatively in ways that simply aren't available when a large model is the only option.

The work is ongoing: Final model training and qualitative playtests, including live playtests with real players comparing the fine-tuned model against the 70B baseline, are still ahead. “We're still refining the models and continuing to evaluate performance,” Kearney says, “but the early signals are encouraging. We're seeing progress toward making real-time, AI-driven gameplay viable at scale.” The next generation of game characters won't just know their lines. They'll understand the scene.

Ready to train and deploy models at scale? Learn how Amazon SageMaker and Amazon EC2 can help you build, iterate, and run AI workloads for even the most demanding real-time environments.

Come ti è sembrato il contenuto?