AWS Cloud Operations Blog

Building your operations management with AI-Powered Operations at re:Invent 2025

ops mgmt banner

As organizations continue to scale and evolve their cloud environments, effective operations management has become more critical than ever. Operations management under the Cloud Operations track at AWS re:Invent 2025 offers a comprehensive lineup of sessions designed to help you build resilient, secure, and efficient operational practices across your AWS environment. Whether you’re managing complex multicloud environments, implementing AI-powered automation, or strengthening your disaster recovery strategies, this track has something for everyone.

This blog post will guide you through the key themes of operations management and highlight sessions that will help you transform your cloud operations strategy.

Plan Your Operations Management Track Experience

Operations management under the Cloud Operations track at re:Invent 2025 showcases AWS’s commitment to simplifying cloud operations through intelligent automation. Whether you’re managing a single-cloud environment or complex multicloud infrastructure, these sessions will provide practical strategies to enhance operational efficiency, security, and reliability.

With 12 sessions spanning five key themes, operations management offers something for everyone, from hands-on workshops to expert-level discussions. To make the most of your re:Invent experience, we recommend:

  1. Focus on your priorities: Select sessions that align with your organization’s immediate operational challenges
  2. Mix formats: Combine lecture-style sessions with interactive workshops and builders’ sessions
  3. Plan for skill development: Choose sessions that match your current skill level and those that stretch your capabilities
  4. Reserve early: Popular sessions fill up quickly, so reserve your spot as soon as registration opens

Key Themes at re:Invent for Operations Management

The operations management is organized around five core themes that address today’s most pressing operational challenges:

1. AI-Powered Operations

The integration of generative AI and machine learning into cloud operations represents one of the most transformative shifts in how organizations manage their infrastructure. Sessions in this theme showcase how Amazon Bedrock, Amazon Q, AWS Systems Manager, and other services can be leveraged to create intelligent operational workflows, from automated monitoring to predictive maintenance.

2. Resilience & Disaster Recovery

Building resilient systems that can withstand disruptions is essential for business continuity. The operations management track features sessions that demonstrate how to combine AWS Resilience Hub with generative AI to create sophisticated disaster recovery playbooks, conduct resilience testing, and implement automated recovery procedures.

3. Multicloud Management

As organizations adopt multiple cloud providers, the complexity of managing diverse environments increases exponentially. Learn how AWS provides tools and services that enable centralized visibility and control across your entire cloud estate, whether it’s AWS, on-premises, or other cloud providers.

4. Automation at Scale

Manual operations simply can’t keep pace with the scale and complexity of modern cloud environments. The operations management track offers practical guidance on implementing automation across your cloud operations, from patch management to security incident response.

5. Compliance & Security

Security and compliance remain top concerns for organizations of all sizes. Discover how to implement automated security controls, streamline compliance processes, and build governance frameworks that scale with your business.

Session Formats to Fit Your Learning Style

re:Invent offers a variety of session formats to accommodate different learning preferences. Here are some must-attend sessions by theme:

AI-Powered Operations

COP322 | Building AI-Powered operational insights and automated remediation | Builders’ session
Location: Wednesday, Dec 3 1:30 PM – 2:30 PM PST | Mandalay Bay
Join this session to build an AI-powered solution that combines Amazon Q, Amazon OpenSearch Service, and AWS Systems Manager to deliver advanced operational insights and automated remediation. You’ll work with Amazon Q via MCP Server to construct queries and implement AI integration with your operational data. Learn to configure OpenSearch for efficient data ingestion using ETL with Amazon S3, enabling near real-time anomaly detection.

COP314 | Scale & automate patching with AI-powered visualization | Workshop
Location: Thursday, Dec 4 12:30 PM – 2:30 PM PST | MGM
In this hands-on workshop, discover how to streamline patch management using AWS Systems Manager and Amazon Q in this hands-on workshop. You will learn how to deploy automated patching solutions, configure compliance reporting, and simplify the creation of dynamic visualizations using natural language queries with Amazon Q.

COP407 | Building custom agents for intelligent AWS patch automation | Code Talk
Location: Wednesday, Dec 3 11:30 AM – 12:30 PM PST | Wynn
This expert-level code talk, we’ll build an agentic solution with Strands Agents SDK and AWS services to transform patch management. You’ll see how to implement a policy engine that validates compliance requirements before authorizing patches, creating a scalable governance framework that reduces manual work. We’ll walk through turning operational requirements into interactive, production-ready agent workflows—driving automated remediation and rapid vulnerability checks.

Resilience & Disaster Recovery

COP303 | Automate disaster recovery playbooks using generative AI | Builders’ Session 
Location: Thursday, Dec 4 11:00 AM – 12:00 PM PST | Wynn
Discover how to strengthen your mission-critical workloads with automated disaster recovery planning using Amazon Q Developer, AWS Resilience Hub, and AWS Systems Manager. This session demonstrates practical techniques for generating and validating recovery runbooks. Learn how to craft effective prompts that produce architecture-aware recovery plans with comprehensive testing capabilities. You’ll gain insights into modernizing your disaster recovery strategy with automation, helping to reduce risk and improve operational resilience.

COP420 | AI-powered resilience testing and disaster recovery | Breakout Session 
Location: Tuesday, Dec 2 1:30 PM – 2:30 PM PST | Wynn
This expert-level breakout session, discover how to enhance resilience and disaster recovery on AWS empowered by AI. This approach bridges infrastructure insights and application-level testing, enabling more effective disaster recovery preparation. You will learn how to leverage Large Language Models (LLMs) with AWS Resilience Hub and AWS Systems Manager to modernize testing, analyze infrastructure, and generate targeted AWS Fault Injection Service experiments and recovery runbooks.

Multicloud Management

COP313 | Multicloud & hybrid node operation at scale is easier than you think | Chalk Talk 
Location: Monday, Dec 1 10:30 AM – 11:30 AM PST | Mandalay Bay
This chalk talk discover practical solutions for streamlining your operations at scale. We’ll explore how AWS Systems Manager, Amazon CloudWatch, and other services can efficiently handle fleet-wide operations including patching, application deployment, incident resolution, and access control management. These services help thousands of customers operate millions of resources across their distributed compute landscape, reducing operational overhead while maintaining security and compliance.

COP342 | Centralize Multicloud Management using AWS | Breakout Session 
Location: Thursday, Dec 4 11:30 AM – 12:30 PM PST | MGM
This breakout session demonstrates how operating in a multicloud environment can introduce operational complexity. In this session, learn to streamline operations with AWS Systems Manager, which allows for easier instance management across all environments. Gain insights into performance with Amazon CloudWatch and Amazon Managed Grafana, which delivers a unified dashboard of metrics and logs from any data source. With these services, you can simplify day-to-day tasks, maintain control, and optimize resources; whether your workloads span AWS, on-premises, or multiple clouds. Take the complexity out of multicloud and focus on what really matters—running your business.

Automation at Scale

COP340 | Building reliable operations, feat. Fannie Mae | Breakout Session 
Location: Tuesday, Dec 2 5:30 PM – 6:30 PM PST | Caesars Forum
This breakout session features a real-world case study from Fannie Mae, showcasing how Fannie Mae transformed their operations by building a cross-region observability platform on AWS, enabling them to automate incident response and improve reliability across their hybrid environment. Walk away with practical strategies for implementing automated incident management, establishing effective on-call processes, and leveraging AWS services to enhance operational reliability within your organization.

COP344 | Implementing Automated Security Controls for Zero-Day Defense | Chalk Talk 
Location: Wednesday, Dec 3 1:30 PM – 2:30 PM PST | MGM
This chalk talk shows how to strengthen your security posture by combining vulnerability management with automated risk controls using Amazon Inspector, AWS Systems Manager, and AWS Security Hub. We’ll demonstrate effective response strategies for zero-day vulnerabilities across Amazon EC2 instances and AWS Regions, implementing automated compliance monitoring and continuous control validation through infrastructure and policy as code.

COP343 | Streamline operations with automated health monitoring and response | Chalk Talk 
Location: Wednesday, Dec 3 12:00 PM – 1:00 PM PST | Mandalay Bay
In this chalk talk, discover how to implement comprehensive health monitoring and automated incident response using AWS Health, Amazon CloudWatch, and AWS CloudTrail. You’ll learn to create effective monitoring patterns, transform metrics into actions, and implement automated remediation workflows.

Compliance & Security

COP310 | Automating compliance and auditing at scale | Workshop 
Location: Wednesday, Dec 3 9:00 AM – 11:00 AM PST | Mandalay Bay
This workshop demonstrates how to build automated compliance controls using AWS Config, Systems Manager, and Audit Manager. Learn to implement automated security assessments and remediation workflows while leveraging Amazon Q CLI and CloudTrail Lake for intelligent investigation.

COP341 | Implement secure automated workflows with AWS Systems Manager | Chalk Talk 
Location: Monday, Dec 1 11:30 AM – 12:30 PM PST | MGM
This chalk talk learn to build comprehensive health monitoring and automated incident response using AWS Health, Amazon CloudWatch, and AWS CloudTrail. Get hands-on experience implementing Systems Manager Automation for remediation, creating effective monitoring patterns, and transforming metrics into actions.

Looking Forward

The operations management under the Cloud Operations track at AWS re:Invent 2025 offers a comprehensive look at how organizations can transform their operational practices using the latest AWS services and best practices. From AI-powered automation to multicloud management, the sessions in this track will equip you with the knowledge and skills needed to build resilient, secure, and efficient cloud operations.

We look forward to seeing you at re:Invent 2025 and don’t forget to visit the AWS Cloud Operations kiosk in the Venetian!

Haven’t registered? There’s still time to attend! Registered through the re:Invent portal.

Samir Behara

Samir Behara

Samir Behara is a Senior Cloud Infrastructure Architect with AWS Professional Services. He is passionate about helping customers accelerate their IT modernization through cloud adoption strategies. Samir has an extensive software engineering background and loves to dive deep into application architectures and development processes to drive performance, operational efficiency, and increase the speed of innovation.