How OCC Built a Governed Cloud Foundation and Then Stress-Tested It

When you are the first to move a critical financial system to the cloud, there is no playbook to follow.

The Options Clearing Corporation (OCC) is a Systemically Important Financial Market Utility (SIFMU). For an organization like OCC, a system outage is not an operational incident; it is a market event. In 2022, OCC became the first SIFMU to receive a Notice of No Objection from the SEC for its proposal to adopt cloud infrastructure. The areas in the proposal were clearing, risk management, and data management applications. There was no precedent to follow. Every architectural decision, every governance control, and every resilience test had to come from first principles and were validated against the expectations of three regulators: the SEC, the CFTC, and the Federal Reserve.

In this post, you learn how the Options Clearing Corporation (OCC) partnered with AWS to solve these challenges by automating governance across a multi-account environment with AWS Control Tower and then validating infrastructure resilience through a structured AWS Fault Injection Service (AWS FIS) program. You will see the specific governance gaps that drove the project, the architectural decisions behind the solution, and the measurable outcomes. Whether you’re in financial services or another regulated industry, this approach provides a replicable model for building governed cloud foundations at scale.

The challenge: Governance at scale in a regulated environment

As OCC’s cloud footprint grew to support development, testing, user acceptance testing (UAT), and production workloads across multiple classification tiers (production workload zone, high testing zone, and development zone), the operational reality became clear: governing a regulated multi-account AWS environment through manual processes increases the risk of configuration drift, compliance gaps, and audit failures.

Each tier carried distinct security requirements, network boundaries, and compliance obligations. Without a standardized approach to account provisioning, configurations drift from the moment an account was created. Credential rotation across over 50 accounts consumed engineering capacity. Audit evidence required manual effort to produce. Building a new workload zone took three to four months through manual processes, with the risk of inconsistent configurations across zones. The compliance posture of the overall environment became increasingly difficult to show.

For a SIFMU, this is not a theoretical concern. Regulation Systems Compliance and Integrity (Regulation SCI) governs OCC’s technology infrastructure. It requires policies and procedures for the designing, developing, testing, and maintaining technology systems that support securities markets.

Four specific gaps drove the decision to act:

No standardized account provisioning. Accounts provisioning was manual with no baseline blueprint.
Fragmented security monitoring. No single view of compliance posture existed across the entire AWS estate.
Manual credential rotation. Administrative credential rotation was a manual process across every account.
Manual audit evidence generation. Generating audit evidence for non-functional requirements (NFRs) required manual assembly.

OCC’s plans for its workload zone architecture accelerated the urgency. The future-state design called for a centralized control plane, a single enterprise continuous integration and continuous delivery (CI/CD) pipeline, and clearer network separation between workload zones. That design required a governed foundation to build on.

The solution: Automated governance with AWS Control Tower

If you need to enforce consistent governance across a multi-account AWS environment, you can use AWS Control Tower to establish that foundation. Control Tower orchestrates account setup and ongoing governance based on AWS best practices, giving you a unified control plane through AWS Organizations, AWS IAM Identity Center, AWS Config, AWS Security Hub, Amazon GuardDuty, AWS CloudTrail, and IAM Access Analyzer.

For OCC, Control Tower was the right tool, with one important modification. By default, Control Tower uses AWS CloudFormation for deployments. Because OCC standardized on Terraform for infrastructure as code (IaC), the AWS team worked directly with OCC engineers to provision Control Tower using the Account Factory for Terraform (AFT). Deploying through AFT kept the toolchain consistent and avoided a parallel IaC track for the governance layer.

For OCC, Control Tower provided four concrete capabilities:

1. A governed landing zone. Every new AWS account is provisioned from a standardized baseline, including IAM roles with deny-by-default policies, identity provider (IdP) integration, vulnerability scanning, endpoint detection and response (EDR), Amazon Machine Image (AMI) encryption keys, and IAM Identity Center policies. Standardized baselines prevent configuration drift at the point of account creation.

2. Layered security controls. Preventive controls through service control policies (SCPs) block non-compliant resource deployment before it occurs. Detective controls through AWS Config rules monitor for configuration drift on an ongoing basis across enrolled accounts. In addition, HashiCorp Sentinel policies enforced within the Terraform pipeline prevent non-compliant infrastructure definitions from being deployed, adding a third layer of control at the CI/CD level. Break-glass roles remove the need for manual password rotation.

3. Centralized visibility. An integrated set of governance tools provides regularly updated compliance status across every account, giving the security function a current view of the control environment.

4. Automated evidence generation. OCC now generates compliance reports programmatically, replacing manual audit preparation. Tagging enforcement drives accurate cost reporting and configuration management database (CMDB) data.

Figure 1: AWS Control Tower Multi-Account Architecture

Migrating existing accounts. Moving existing accounts into the Control Tower organization required careful coordination. The team moved each account from the legacy management (payer) account into the new Control Tower structure, with resources remediated to align with the baseline configuration. The team ran migrations in close collaboration with each product team, with the goal of a controlled transition rather than a disruptive one.

The rollout followed four phases:

Planning and migration scoping.
Playbook finalization and CI/CD validation.
Execution and team training.
Production workload zone onboarding.

With the final phase complete, the full workload zone architecture was realized. The result: 2,041 active controls (AWS Foundational Security Best Practices and CIS AWS Foundations Benchmark standards), consistently enforced across the entire AWS estate, with faster account provisioning and a compliance posture OCC can show to its regulators.

Validating resilience with AWS Fault Injection Service

Governance tells you the environment is configured correctly. It does not tell you whether the infrastructure recovers when something breaks. Once you have governance in place, the next step is validating that your infrastructure recovers under real failure conditions.

For a financial market utility, that distinction matters. OCC needed to demonstrate resilience under real failure conditions, not just assert it. That required a different kind of test.

With AWS Fault Injection Service (AWS FIS), you can introduce controlled failures into production-like environments and observe how systems respond. OCC engaged AWS to lead a formal AWS FIS program covering 34 experiment categories, executed from June 2025 to April 2026. The team prioritized these 34 categories from hundreds of potential failure modes identified across OCC’s infrastructure during the engagement scoping phase.

Over that period, the team developed over 100 experiment templates and ran over 350 individual experiments. Approximately half completed successfully. The other half surfaced failures, or the team stopped them. In chaos engineering, this is the expected and desired outcome. Every failure surfaced a specific finding, such as a misconfigured timeout, a missing rollback path, or a gap in how a service handled a node disconnect. The team remediated each finding and re-tested. The cycle repeated until the infrastructure met the resilience requirements.

Experiments covered the following failure categories:

Memory and CPU constraints.
Network latency and packet loss.
Node disconnection and auto-recovery.
Amazon Elastic Block Store (Amazon EBS) volume failures.
Cascading failure scenarios.

AWS FIS architecture and security model. The architecture behind AWS FIS reflects the same least-privilege philosophy as the broader infrastructure. AWS FIS runs from a centralized orchestrator account in the infrastructure organizational unit (OU), assumes scoped IAM roles in target accounts, and uses resource tag filters to limit the scope. AWS FIS configurations affect only pre-tagged resources for experimentation.

For on-premises and perimeter network servers (DMZ), AWS Systems Manager hybrid activations handle agent communication through outbound HTTPS, with no inbound firewall rules required. The team grants access to higher-sensitivity environments temporarily for validation only and fully revokes it afterward.

Figure 2: AWS FIS Orchestration Architecture

AWS structured this program as a permanent capability, not a point-in-time engagement. The team fully automates and organizes the experiments into repeatable scenarios. Resilience testing is now an ongoing operational discipline for OCC, not something the team rebuilds each time it runs.

Results and outcomes

Through this engagement, OCC achieved the following measurable outcomes:

Area	Outcome
Governance controls	OCC now enforces 2,041 active controls consistently across its full AWS estate, reducing compliance risk
Account provisioning	Engineering teams provision new accounts from governed baselines in days instead of three to four months, eliminating manual provisioning errors
Compliance evidence	Audit preparation effort dropped significantly because compliance evidence now generates programmatically
Credential management	Break-glass roles eliminated manual credential rotation across more than 50 accounts
Resilience testing	The team validated infrastructure recovery across 34 experiment categories, 100+ templates, and 350+ experiments
Ongoing capability	Resilience testing now runs as a permanent, automated operational discipline

The 2,041 controls include AWS Foundational Security Best Practices and CIS AWS Foundations Benchmark standards enforced through AWS Security Hub, with centralized finding aggregation across Amazon GuardDuty, AWS Config, and IAM Access Analyzer.

For OCC’s engineering teams, this means faster account provisioning, pre-built controls that satisfy NFRs automatically, and compliance evidence that does not require manual assembly before each audit.

For the environment, the workload zone architecture now has the governance control plane and the resilience validation it was designed to support.

Lessons learned

Based on this engagement, the following lessons can help you if you are building governed cloud foundations for regulated workloads:

Start governance before scale. Implementing AWS Control Tower early in a cloud journey is simpler than retrofitting governance onto an existing multi-account environment. If you are retrofitting governance onto existing accounts, plan for careful coordination with each product team during the migration.
Align IaC toolchains from the start. If you deploy Control Tower through Terraform using AFT, you avoid maintaining parallel CloudFormation and Terraform tracks. This reduces operational complexity and keeps the governance layer consistent with the application infrastructure.
Treat resilience testing as an ongoing discipline, not a one-time event. OCC designed its AWS FIS program to run continuously, with automated, repeatable experiments. Continuous testing builds confidence over time and catches regressions that point-in-time testing would miss.
Expect and plan for failures in chaos engineering. Approximately half of OCC’s AWS FIS experiments surfaced issues. Each finding led to a specific remediation. The value of chaos engineering is in the failures it uncovers, not in the tests that pass.
Use least-privilege principles for testing infrastructure. When you design your AWS FIS architecture, use scoped IAM roles, tag-based resource targeting, and temporary access grants. Scoped roles and tag filters limit the scope area and maintain the security posture of the environment even during fault injection.

Conclusion

OCC now operates a cloud environment where every workload, from development through production, runs inside a consistent governance framework. That framework was built to satisfy the requirements of three federal regulators and validated under real failure conditions.

When OCC began this work, no SIFMU had adopted cloud infrastructure for critical financial systems. There was no precedent to follow and no playbook to borrow. If you are building a governed cloud foundation for regulated workloads, you can apply the governance patterns, toolchain decisions, and resilience testing methodology described here to your own environment.

Learn more:

Set up multi-account governance with AWS Control Tower
Run controlled resilience experiments with AWS Fault Injection Service
Apply reliability best practices from the AWS Well-Architected Framework Reliability Pillar
See industry solutions at AWS for Financial Services

AWS for Industries

How OCC Built a Governed Cloud Foundation and Then Stress-Tested It

The challenge: Governance at scale in a regulated environment

The solution: Automated governance with AWS Control Tower

Validating resilience with AWS Fault Injection Service

Results and outcomes

Lessons learned

Conclusion

Learn more:

Resources

Follow

Learn

Resources

Developers

Help