AWS Public Sector Blog

A governance framework for building trustworthy agentic AI for public sector and regulated organizations

A governance framework for building trustworthy agentic AI for public sector and regulated organizations

Public sector organizations face a growing challenge as they adopt agentic AI systems. They need to confront how to benefit from increased AI autonomy while continuing to meet security, compliance, and accountability expectations. Unlike traditional AI systems that respond to prompts or execute narrowly defined tasks, agentic AI systems can understand context, make decisions, plan multistep workflows, and take autonomous actions. These capabilities introduce governance and risk considerations that existing AI control models don’t fully address.

As agentic AI systems gain the ability to act across systems, data, and services, the consequences of design gaps, misconfiguration, or unintended behavior increase. For organizations operating in regulated environments, this makes governance, auditability, and operational control foundational requirements rather than optional enhancements.

This post outlines a practical governance framework for agentic AI systems, with a focus on public sector and other highly regulated environments. It introduces a scope-based model for classifying agent autonomy, identifies core security dimensions, and describes how organizations can align agentic AI governance with existing risk, compliance, and assurance programs. You’ll learn how to classify AI systems by autonomy level, implement scope-appropriate security controls, and align your governance approach with international standards such as ISO/IEC 42001:2023.

In a future post, we’ll cover the practical implementation steps, including phased rollout, threat modeling, automation, and audit preparation.

This post covers:

  • How agentic AI systems differ from traditional AI from a governance and risk perspective
  • A scope-based model for classifying agent autonomy and authority
  • Six security dimensions that support trustworthy agentic AI systems
  • How agentic AI governance can align with existing compliance and audit expectations
  • Amazon Web Services (AWS) service capabilities and compliance posture relevant to agentic AI governance

Understanding agentic AI

Agentic AI represents an evolution from reactive assistants to proactive, autonomous systems that can understand, decide, and act with minimal oversight. These systems access tools, data, and external services to navigate complex tasks, adapt to changing conditions, and collaborate with other agents to accomplish goals.

Two characteristics are especially important for governance:

  • Autonomy – The degree to which the system can make decisions without human intervention
  • Agency – The scope of actions the system is authorized to take within its environment

Understanding where your AI systems fall on these dimensions is the first step toward implementing appropriate governance. Treating all AI systems as equivalent can result in either over-constraining low-risk use cases or under-governing highly autonomous systems.

A scope-based approach to agentic AI security

Not all agentic AI systems require the same level of security controls. Applying a single control model across all implementations often leads to unnecessary friction or unaddressed risk. A scope-based classification approach helps organizations match governance controls to actual system capability and impact.

The following framework defines four scope levels based on the degree of agency and autonomy.

Scope 1: No agency

Scope 1 systems operate in read-only or advisory mode. They are human-initiated, follow fixed execution paths, and can’t modify systems or data.

These systems typically analyze information, summarize content, or provide recommendations. Governance is still required, but risk is limited because the system can’t take direct action.

Scope 2: Prescribed agency

Scope 2 systems can propose or prepare changes but require explicit human approval before execution. They might access multiple tools or systems, but a human remains responsible for authorizing each consequential action.

This scope is appropriate for systems that draft policy updates, generate configuration recommendations, or prepare remediation steps for human review.

Scope 3: Supervised agency

Scope 3 systems execute end-to-end workflows after human initiation. They select tools dynamically and can complete tasks autonomously within predefined boundaries. Human oversight remains available through monitoring, intervention points, or escalation paths.

Effective human escalation in scope 3 means the agent can recognize when a situation exceeds its authority or confidence threshold and surface the decision to a human with full context, including what it attempted, why it’s uncertain, and what options it recommends. The human receives enough information to make an informed decision without reinvestigating the situation from scratch.

Examples include systems that respond automatically to defined security events while escalating higher-risk situations for review.

Distinguishing scope 2 from scope 3

The boundary between scope 2 and scope 3 is where many organizations need the most clarity. The following table provides concrete attributes to help classify a system.

Attribute Scope 2 (prescribed agency) Scope 3 (supervised agency)
Action execution Proposes actions; human approves each one before execution Executes actions autonomously within defined boundaries
Tool selection Uses a fixed, predefined set of tools Selects tools dynamically based on context
Data access Read access to operational data; write access gated by human approval Read and write access within scoped permissions
Decision authority Human makes every consequential decision Agent makes decisions within policy; escalates exceptions
Workflow complexity Single-step or linear workflows Multistep, branching workflows with conditional logic
Human involvement Approval required before each action Monitoring and intervention available; approval required only at escalation points
Example Agent drafts a remediation plan and waits for engineer approval Agent detects a misconfiguration, applies a preapproved fix, and notifies the team

Decision criteria: If the system can execute any action that modifies state without a human explicitly approving that specific action, it’s scope 3 or higher. If every write operation requires a human to review and confirm, it’s scope 2.

Scope 4: Full agency

Scope 4 systems operate with continuous autonomy and can initiate actions without direct human prompting. They might adapt behavior over time and operate independently for extended periods, with humans providing strategic oversight rather than task-level control.

This scope requires the most rigorous governance and is appropriate only where organizations have mature controls, monitoring, and assurance mechanisms in place.

Six security dimensions for agentic AI governance

Regardless of scope level, effective governance of agentic AI systems requires controls across six security dimensions. These dimensions aren’t new security concepts, but agentic systems combine them in ways that increase the impact of gaps or misconfiguration.

Identity context

Agentic systems must operate under clearly defined identities with explicit authorization boundaries. This includes the ability to act on behalf of users or services while maintaining traceability and accountability. With AWS Identity and Access Management (IAM), you can define granular permissions that specify exactly which actions each agent can perform and under what conditions. Strong identity controls help support auditability and help prevent unintended privilege escalation.

Data, memory, and state protection

Agentic AI systems often maintain persistent memory and state across interactions. Protecting this information requires access controls, encryption, and safeguards against unauthorized modification. AWS Key Management Service (AWS KMS) and AWS Secrets Manager help protect sensitive data and credentials that agents access, but they address what the agent can reach, not the integrity of the agent’s own memory.

Memory integrity requires additional controls that protect the agent’s reasoning context from corruption or manipulation:

  • Memory expiration and retention policies – Define how long agent memory persists and when it must be purged. Short-lived memory reduces the window for memory-based issues and helps prevent stale context from influencing future decisions. Implement time-to-live (TTL) policies on conversation history, session state, and cached tool outputs.
  • Read-only memory for lower scopes – For scope 1 and scope 2 systems, enforce read-only memory where the agent can reference prior context but can’t modify its own memory store. This helps prevent an agent from being manipulated into rewriting its own instructions or context through adversarial inputs.
  • Memory isolation between sessions – Prevent cross-session memory contamination by isolating memory stores per user, per task, or per security boundary. An agent processing one user’s request must not carry over context from another user’s session.
  • Integrity validation – Implement checksums or cryptographic signatures on memory state so that unauthorized modifications to an agent’s stored context can be detected before the agent acts on corrupted data.

These controls become increasingly important as scope increases. A scope 1 system with read-only memory and short TTLs has a limited surface area for issues. A scope 4 system with persistent, writable memory across sessions requires all these protections plus continuous monitoring for memory drift or injection.

Audit and logging

When AI systems act autonomously, comprehensive logging becomes essential. Governance requires visibility into what actions were taken, when they occurred, and the context that led to those decisions.

AWS CloudTrail and Amazon CloudWatch provide visibility into API-level actions and system events, but to capture the full decision context (the reasoning chain that leads to those actions), you must combine these with Amazon Bedrock invocation logging and custom application-level tracing of agent steps.

This distinction matters for governance:

  • API-level logging (CloudTrail, CloudWatch) – Records what happened, including which APIs were called, by which identity, at what time, with what parameters. This is the foundation for accountability and audit trails.
  • Decision-context logging (Amazon Bedrock invocation logging, custom tracing) – Records why it happened, including what prompt the agent received, what reasoning it applied, what alternatives it considered, and why it chose a specific action. This is what auditors and incident responders need to understand agent behavior.

For regulated environments, both layers are required. API logs alone can tell you that an agent modified a security group, but decision-context logs tell you what triggered that decision and whether the agent’s reasoning was sound.

Agent and foundation model (FM) controls

Guardrails help prevent agents from producing harmful outputs or executing unsafe actions. These controls might include input validation, output filtering, behavioral constraints, and isolation mechanisms to help limit the scope of impact if a system behaves unexpectedly. Amazon Bedrock Guardrails provides customizable safeguards for content filtering, and process isolation helps keep a compromised agent from affecting other systems.

Agency boundaries and policies

Clear, enforceable boundaries define what an agent can and can’t do. These boundaries must be implemented through technical controls rather than relying solely on policy documentation.

IAM alone provides significant boundary enforcement for agentic systems. IAM policies can restrict which API actions an agent can call, which resources it can access, and under what conditions, such as time of day, source IP, and whether multi-factor authentication (MFA) is present. IAM session policies can further constrain permissions for individual agent invocations, and permissions boundaries can set a maximum privilege ceiling that no policy can exceed. For many scope 1 and scope 2 systems, IAM policies combined with resource-based policies provide sufficient boundary enforcement without additional tooling.

For higher-scope systems, layer AWS Organizations service control policies (SCPs) to establish account-level guardrails that no agent can bypass regardless of its IAM permissions. SCPs help prevent privilege escalation by setting hard boundaries at the organizational level. Explicit limits help reduce the risk of unintended behavior as system autonomy increases.

Orchestration

Agentic systems often rely on orchestration layers to coordinate tools, services, and other agents. AWS Step Functions provides workflow orchestration with built-in approval gates and state management, helping you maintain control over complex multi-agent workflows. Structured workflows, approval gates, and state management help maintain control over complex interactions and support consistent governance across implementations.

Aligning agentic AI governance with ISO/IEC 42001

ISO/IEC 42001:2023 provides an internationally recognized management system framework for responsible AI use. Organizations can align agentic AI governance with this standard by mapping the six security dimensions to specific Annex A controls.

Security dimension ISO 42001 Annex Alignment
Identity context Annex A.9 (responsible AI use) Agent identity, authorization boundaries, and traceability support responsible use requirements
Data, memory, and state protection Annex A.7 (data for AI systems) Memory retention policies, data governance, and integrity controls map to data management requirements
Audit and logging Annex A.6 (AI system lifecycle, monitoring) Decision-context logging and API-level audit trails support lifecycle monitoring and accountability
Agent and FM controls Annex A.4 (AI system impact assessment) Guardrails, output filtering, and behavioral constraints support impact assessment and risk mitigation
Agency boundaries and policies Annex A.5 (AI system policies) IAM policies, SCPs, and technical boundary enforcement implement organizational AI policies
Orchestration Annex A.8 (AI system operation) Workflow controls, approval gates, and state management support operational governance

This mapping helps organizations build governance frameworks that satisfy both technical security needs and compliance requirements simultaneously, without creating parallel control structures. If you already maintain an ISO 42001 management system, the six-dimension model provides a technical implementation layer that maps to your existing control objectives.

AWS compliance posture for agentic AI workloads

Organizations building agentic AI systems in regulated environments need to understand the compliance posture of the underlying services. Amazon Bedrock is the primary AWS service for building agentic AI applications, and its authorization status determines which FMs you can use within your compliance boundary.

Amazon Bedrock FedRAMP authorization

Amazon Bedrock is a Federal Risk and Authorization Management Program (FedRAMP) High authorized service in the AWS GovCloud (US-West) Region and is FedRAMP Moderate authorized in the US East and US West commercial AWS Regions. This means organizations with FedRAMP High requirements can build agentic AI systems in AWS GovCloud (US), and those operating under FedRAMP Moderate can use commercial Regions.

Not all FMs available in Amazon Bedrock carry the same authorization status. For a current list of which Amazon Bedrock FMs are FedRAMP Moderate and FedRAMP High authorized, refer to the Amazon Bedrock models – FedRAMP authorization status page. This page is updated as new models receive authorization. For more information about the overall FedRAMP scope of AWS services, refer to AWS Services in Scope by Compliance Program.

When selecting FMs for agentic systems in regulated environments, verify that your chosen model is authorized at the appropriate FedRAMP level for your use case. A governance framework is only as strong as the compliance posture of the services it relies on.

Data protection in Amazon Bedrock

Your data remains under your control. With Amazon Bedrock, your content isn’t used to improve base models and isn’t shared with model providers. This is a critical consideration for agentic systems that process sensitive data because agent memory, conversation history, and tool outputs constitute data that must remain within your compliance boundary.

Supporting compliance standards

Beyond FedRAMP, AWS supports a broad set of security standards and compliance certifications relevant to agentic AI workloads, including FIPS 140-2 and NIST 800-171. Organizations subject to these requirements can build agentic AI systems on AWS while maintaining their compliance posture, provided they implement appropriate controls at the application layer (which is where the six security dimensions and scope model from this post apply).

Conclusion

If you’re planning to deploy agentic AI in a regulated environment, the time to establish governance is before your first agent goes into production, not after an audit finding forces the conversation.

Start by inventorying every AI system in your environment and classifying it using the four-scope model. Most organizations find they already have scope 1 and scope 2 systems they haven’t formally categorized. After completing that inventory, evaluate your current controls against the six security dimensions. Pay particular attention to audit logging and agency boundaries because these are the areas where existing IT governance frameworks have the largest gaps for agentic systems.

Rather than jumping straight to scope 4, build your governance muscle with prescribed-agency systems where a human still approves every consequential action. Use what you learn to calibrate controls for higher-autonomy systems over time.

The scope-based model in this post isn’t a theoretical exercise. It’s a classification system you can apply to your current AI inventory this week and use to make concrete decisions about what controls each system needs.

Next steps and resources

Paul Keastead

Paul Keastead

Paul Keastead is a Senior Security Engineer with AWS Global Professional Services Security, where he builds the operational security mechanisms that govern how ProServe delivers to customers worldwide. His work spans engagement security automation, AI-powered provider risk assessments, and GenAI governance for a global delivery organization across 24 countries. He designs assessment frameworks and tooling that enable providers to demonstrate security posture across complex control environments including CMMC, FedRAMP, and critical infrastructure protection. He brings over a decade of security leadership across the Marine Corps, federal research, and private sector technology compliance.

Satish Uppalapati

Satish Uppalapati

Satish Uppalapati is an Associate Assurance Consultant with AWS Security Assurance Services and has more than 8 years of experience in IT risk, governance, and regulatory assurance. He works with AWS customers to help align cloud environments with frameworks such as ISO 27001, SOC 2, and FFIEC. Satish also focuses on advancing governance for AI systems, including emerging standards such as ISO/IEC 42001.