Architecting agentic AI for scale and trust from the start

By: Noel Williams, Banking and Capital Markets Leader — PwC Australia
By: Sanya Mediratta, Senior Manager, Digital Trust – PwC Australia
By: Jerry Chen, Senior Solutions Architect – AWS

The race to deploy AI agents is accelerating—and so is the cost of getting it wrong.

Recent research from PwC shows companies with a mature responsible AI program recover faster from incidents. Those without one take more than three times as long—and might never fully recover value.

AI introduces new risks. But many existing controls, such as data governance, access controls, and operational processes, remain essential. In fact, AI can amplify any weaknesses in those controls at speed and scale.

Responsible AI works when organizations integrate AI-specific safeguards into existing risk frameworks and controls are engineered into the agentic platform infrastructure. Controls must be designed upfront that continue to run as core functionality of the agent, throughout the agent’s lifecycle. This is what moves organizations from pilot to scaled value.

You don’t need to start from scratch. But you do need to design trust from day one. This post brings together PwC’s risk and governance experience with Amazon Web Services (AWS) technical architecture and tooling. It sets out a practical blueprint for deploying AI agents safely—and at scale.

Three questions your board will ask

Let’s examine a common use case where your organization deploys an AI agent to handle customer support for triaging inquiries, resolving issues, and escalating complex cases. Within months, it’s successfully handling thousands of interactions daily at impressive speed, cost, and scale.

Before that happens, three questions will determine whether the solution scales and whether it remains trusted:

Can we explain how this solution makes decisions before we deploy it?
Who owns, monitors, and evaluates the overall solution performance, and when should they intervene?
Can we demonstrate sustained and trusted performance, and do we have evidence to prove it?

The following graphic illustrates these questions and how one naturally leads to the next.

Figure 1 – Illustration of the question sequence

The following sections go into each of these questions in detail.

Can we explain how this solution makes decisions before we deploy it?

Your customer support agent will triage thousands of inquiries daily. Before it touches a single customer interaction, your board needs confidence that they understand why it makes those decisions. Explainability isn’t an afterthought—it needs to be architected in from day one.

Operationalizing agentic AI on AWS in the AWS Prescriptive Guidance documentation provides practical frameworks across three dimensions:

Strategy — AI roadmap development with clear linkage to business outcomes, establishing governance frameworks that prioritize explainability from the outset
People — Structured skills development so relevant teams can interpret and communicate AI decisions
Technology — Agentic workflow design and system integration patterns that build observability and explainability into the architecture

AWS also has services and features that are particularly useful for you in this context:

Amazon Bedrock Guardrails Automated Reasoning checks can detect hallucinations, suggest corrections, and highlight unstated assumptions in the response of your generative AI application. More importantly, Automated Reasoning checks can explain why a statement is accurate using mathematically verifiable, deterministic formal logic.
Amazon SageMaker Clarify helps organizations detect bias in their AI models and explain why a model made a specific decision in plain terms that regulators, auditors, and boards can understand.
Amazon Bedrock AgentCore Observability helps you trace, debug, and monitor agent performance in production environments. It offers detailed visualizations of each step in the agent workflow, so you can inspect an agent’s execution path, audit intermediate outputs, and debug performance bottlenecks and failures.

Illustrative PwC governance checkpoints

Ask the following questions before deployment:

Has the board aligned on commercial rationale, values, and risk appetite?
Does this AI use case comply with applicable legal and regulatory requirements?
How does this agent use personal, third-party, and confidential information—in both training data and at inference?
Does the training data reflect all customer segments? Have potential blind spots been identified?
Can individual decisions be explained—not only model performance?
Do outputs make intuitive sense to domain experts?

Who owns, monitors, and evaluates this solution’s performance, and when should they intervene?

Your agent is live. It’s handling 5,000 customer inquiries daily. Performance looks strong initially, but in week three, escalation volumes rise and resolution rates drop.

Without end-to-end observability across the agent’s reasoning chain, your operations team sees only the surface symptoms and can’t isolate the root cause.

Without clear technical thresholds and defined governance ownership, teams start pointing fingers. Nobody knows whether to pause the agent, restrict it to lower-risk interactions, or keep watching.

The AWS Well-Architected Generative AI Lens outlines design principles that help organizations track, control, and intervene in AI agent performance:

Observability and reporting — Comprehensive monitoring of performance, cost, and security across foundation models (FMs)—giving teams a single view of system health
Overload mitigation — Rate limits, workflow tracing, and circuit-breaker patterns to understand how the system behaves under stress and prevent cascading failures
Operational controls — Timeout mechanisms on agentic workflows and automated conditions to halt long-running or anomalous processes before they cause harm

PwC teams have helped customers adopt AWS generative AI solutions to optimize application performance. For example, they used the Well-Architected IaC Analyzer to provide prescriptive recommendations by automatically analyzing infrastructure as code (IaC) artifacts, shortening performance evaluation time from hours to minutes.

Illustrative PwC governance checkpoints

Operational accountability requires clear answers to the following questions:

Is there a designated product owner with well-defined roles and responsibilities across enabling teams?
Have evaluation metrics been defined before deployment—not after the first incident?
Are change and release management controls applied consistently to all AI models?
Does the organization have observability across the agent’s full reasoning chain—not only surface-level performance metrics?
Is there a documented change management process capturing what changed, why, and what was retested?

Can we demonstrate sustained and trusted performance, and do we have evidence to prove it?

The agent has been live for 12 months. A customer disputes a refund denial and lodges a complaint with a financial conduct regulator. The regulator asks, “Show us how you handled this customer’s inquiry. Was the decision fair? How do you know?”

You have 48 hours to respond.

Observability and auditability need to be built into the architecture from the start—not retrofitted under pressure. Three AWS capabilities are foundational here:

Amazon Bedrock AgentCore Observability — AgentCore Observability enables real-time monitoring of agentic systems through Amazon CloudWatch powered dashboards tracking latency, token usage, and error rates, with OpenTelemetry-compatible trace visualizations for debugging performance bottlenecks.
AWS Audit Manager AI best practice framework v2 — AWS Audit Manager provides a prebuilt standard framework to help you gain visibility into how your generative AI implementation on Amazon Bedrock and Amazon SageMaker AI is working compared to AWS recommended best practices.
AWS CloudTrail — Captures API calls for generative AI workloads, including Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases. AWS CloudTrail Lake enables SQL-based querying to reconstruct decision chains.

Illustrative PwC governance checkpoints

Auditability and fairness require ongoing discipline:

Does the organization maintain a complete log of all agent decisions, including the rationale for each?
Are quarterly fairness audits conducted comparing agent performance across customer segments?
How does the organization use continuous monitoring and automated testing to detect model drift?
Is there a documented response protocol for disputes and can the agent’s compliance with that protocol be measured?
Can the organization demonstrate that corrective action has been taken to prevent recurrence?

Building trust at scale

Here’s the competitive reality: organizations that move fastest aren’t the ones that skip governance—they’re the ones that embed it from the start. Transparent decision-making, clear observability and audit trails, and rigorous review processes aren’t constraints on speed. Done right, they’re a competitive advantage, demonstrated to customers through fair outcomes, to boards through measurable controls, and to regulators through auditable evidence.

Australia’s Voluntary AI Safety Standards, published in October 2025, reinforce this approach. The ten guardrails—spanning transparency, human oversight, contestability, and accountability—are principle-based by design, giving organizations latitude to tailor implementation to their specific risk profile. But that flexibility comes with responsibility. In the absence of prescriptive rules, the onus falls on organizations to demonstrate that their governance is fit for purpose.

PwC and AWS bring complementary strengths to this challenge. PwC’s deep experience in governance, risk, and responsible AI frameworks, paired with the technical architecture, tooling, and operational guidance of AWS provides not merely a strategy but a deployable path to AI agents that are trusted and trustworthy.

Ready to get started? Explore the AWS Well-Architected Generative AI Lens and reach out to your AWS and PwC teams to discuss how to safely implement a deployable path to AI agents within your organization.

PwC – AWS Partner Spotlight

PwC is an AWS Premier Tier Services Partner that helps you drive innovation throughout IT and the business to compete in today’s service economy.

Contact PwC | Partner Overview | AWS Marketplace

AWS Partner Network (APN) Blog

Architecting agentic AI for scale and trust from the start

Three questions your board will ask

Can we explain how this solution makes decisions before we deploy it?

Illustrative PwC governance checkpoints

Who owns, monitors, and evaluates this solution’s performance, and when should they intervene?

Illustrative PwC governance checkpoints

Can we demonstrate sustained and trusted performance, and do we have evidence to prove it?

Illustrative PwC governance checkpoints

Building trust at scale

PwC – AWS Partner Spotlight

Resources

Follow

Learn

Resources

Developers

Help