AWS for Industries
Shaping the future of telco operations with an agentic AI collaboration approach
The current state of telco operations: challenges and complexity
Global telco operators today find themselves squeezed between surging network demand and stagnating revenue growth. According to PwC, global telecom service revenue rose by just 4.3% in 2023 and is projected to grow at a Compound Annual Growth Rate (CAGR) of only 2.9% through 2028—below inflation. Average revenue per user (ARPU) for mobile is expected to decline by 1.3% annually over the same period. Furthermore, Ericsson reports that 5G subscriptions are set to quadruple by 2028, reaching ~7.5 billion, creating a structural mismatch between capital intensity and revenue growth.
Operational expenditure (opex) tells a similar story. Omdia notes that global telecom opex decreased by just 0.2% in 2024, to ~$1.63 trillion. Adjusted opex (excluding depreciation/amortization) fell only 0.1% to ~$1.31 trillion.
AI adoption remains nascent. According to Light Reading, operators primarily deploy generative AI in customer care chatbots, while heavier use cases—such as network anomaly detection, predictive maintenance, or process automation—remain immature due to data quality and integration challenges.
All this comes against a backdrop of exploding traffic. Ericsson’s 2025 Mobility Report highlights that mobile data traffic grew 19% year-on-year between Q1 2024 and Q1 2025, reaching 172 exabytes per month, with video accounting for 74% of all mobile data. Ericsson projects this will grow at ~17% CAGR through 2030, effectively doubling or more.
For Telco network operators, this means managing multi-domain networks (5G, edge, Open RAN), distributed IT environments, and rising customer expectations—all under flat or shrinking budgets. The reality is stark: traffic and complexity continue to accelerate, while revenue and margins barely move. Traditional centralized analytics and incremental automation are no longer enough. A new approach is needed.
A new model for telco operations based on an agentic AI collaboration approach
The next wave of Telco transformation will be driven not by siloed AI deployments, but by agentic architectures—where multiple intelligent agents collaborate across domains to autonomously handle complexity.
The agentic AI collaboration approach for telco operators, built on Amazon Web Services (AWS) prescriptive guidance for agentic AI systems, can define a new model for telco operations. The foundations include the following:
- Local agents fine-tuned for domain expertise: On-premises AI platforms are crucial for meeting internal requirements, regulatory compliance, and government mandates that necessitate keeping and processing data locally. This is particularly vital for adhering to strict data residency laws, which dictate the physical or geographical location where data is stored and processed, regardless of its origin. There are over 100 national data privacy laws globally, including notable regulations such as GDPR in the European Union, HIPAA in the United States, the Digital Personal Data Protection Act (DPDP Act) in India, and the Personal Information Protection Law (PIPL) in China. A core principle for compliance is that all customer and employee data must not be accessible to those outside their home legal jurisdiction, unless explicit, per-usage consent is given. The 3rd Generation Partnership Project (3GPP), the global standards development organization for mobile telecommunications, has significantly expanded its efforts to provide standardized user consent mechanisms for 5G systems. These efforts emphasize user data privacy for collected data both within operator domains and when shared with third-parties through northbound interfaces. This includes privacy-sensitive data such as User Equipment (UE) identifiers, location information, and measurement data. To comply with these requirements, the solution is to use local agents fine-tuned for domain expertise, deployed on AWS Outposts directly inside telco data centers. Using lightweight fine-tuning techniques such as Low-Rank Adaptation (LoRA) on Amazon SageMaker AI, these agents become true experts in their domains (for example RAN optimization, transport network monitoring, and IT ticket triage). Fine-tuning becomes crucial when a telco possesses unique or proprietary data that is significantly different from the broad datasets on which base models were initially pre-trained. Local deployment makes sure that sovereignty and compliance requirements are met and sensitive network data never leaves the operator’s premises. Open-weight large language models (LLMs) offer flexible deployment options and can be effectively fine-tuned for specific domain applications. These models can be deployed on AWS Outposts, maintaining data sovereignty while providing specialized expertise for telco operations. Through extensive testing, we’ve observed that while larger models are available, there isn’t always a proportional improvement in performance for domain-specific tasks. Models ranging from 8 billion to 30 billion parameters have demonstrated optimal performance as domain agents for telco network operations, striking an ideal balance between computational efficiency and domain expertise. These right-sized models, when fine-tuned with proprietary telco data, provide excellent results, while remaining within the computational constraints of on-premises AWS infrastructure, preventing telco providers from investing heavily in CapEx hardware expenses.
- A set of supervisory agents built with Amazon Bedrock Agent Core, orchestrating the actions of these domain-specific agents. Hosted centrally on AWS and using Amazon Bedrock Agent Core, the super-agent has a holistic view of the entire network and IT stack. It can detect cross-domain dependencies, prioritize incidents, and assign tasks to the most relevant agents. The benefits of this approach are multi-layered. First, it drives speed and precision. Localized models embedded at the edge are closer to the source of the data and context. They can detect anomalies, predict failures, or optimize configurations within milliseconds—without waiting for a round trip to a centralized cloud. For latency-sensitive functions such as network slicing, dynamic spectrum allocation, or security incident response, this difference is decisive. Second, it enables a new scale of orchestration. Local agents can manage domain-specific tasks, while a super-agent—orchestrating across multiple specialized agents—creates end-to-end intelligence. The AWS Agentic AI Framework describes this pattern as “collaboration through orchestration,” where agents interact with each other, exchange insights, and coordinate actions. In a telco context, this means that a root-cause issue identified in the RAN can trigger automated responses across transport, core, and customer service, reducing mean time to resolution from hours to minutes. The super-agent must coordinate multiple domain-specialized agents, make cross-domain decisions, and plan multi-step workflows, while maintaining cost-effectiveness. Amazon Nova Pro is optimized for complex reasoning and planning, which is critical for orchestrating telco operations across IT, OSS, BSS, and network domains. It handles large inputs (such as network logs, telemetry data, incident histories, and CRM cases), thus enabling holistic visibility across the telco infrastructure.
Figure 1 Agentic AI collaboration building blocks
- Inter-agent collaboration: A2A protocol and MCP integration: A key enabler of an effective Agentic AI framework in telco operations is robust inter-agent communication and system integration. This is achieved through two complementary mechanisms: the Agent-to-Agent (A2A) protocol and the Model Context Protocol (MCP). The A2A protocol enables seamless collaboration between local edge agents and the centralized super-agent, as well as among local agents themselves. It supports asynchronous, multi-agent coordination, allowing agents to share insights, propose actions, and request validation from peers or the super-agent. It enables strands-based chaining, where the output of one agent can trigger subsequent actions by another agent, forming complex workflows without human intervention. It maintains auditability and traceability, logging all inter-agent communications for compliance and post-incident analysis. It reduces escalations by enabling edge agents to resolve issues collaboratively before notifying the central team. It facilitates distributed decision-making, making sure of faster, context-aware responses to network anomalies. It creates a flexible architecture where new agents or skills can be added seamlessly without disrupting existing workflows. Model Context Protocol (MCP) serves as a standardized interface between agents and the operational systems (network functions, OSS/BSS, IT systems). It provides secure, structured access to telemetry, logs, configuration, and service workflows. It makes sure that agents can execute actions safely, from automated remediation to guided workflow recommendations. It supports tool integration and orchestration, allowing agents to invoke specialized diagnostic tools or network APIs. It standardizes communication across heterogeneous systems, reducing the complexity and cost of integrating legacy network functions. It enables repeatable, scalable operations, where agents consistently interact with systems in a predictable, auditable manner. A2A and MCP create a collaborative, secure, and scalable ecosystem, as shown in the following figure. Edge agents can act autonomously yet remain aligned with the global operational strategy defined by the super-agent. Complex cross-domain workflows, such as multi-vendor 5G troubleshooting or CRM-linked service resolution, are run automatically, rapidly, and safely.
This architecture enables a fundamental shift: moving from centralized, human-driven troubleshooting to distributed, autonomous, and collaborative operations. Telco operators can gain faster incident resolution, lower operational costs, and improved customer satisfaction, while maintaining full control over sensitive data and compliance.
Benefits: from cost savings to operational agility
According with McKinsey data telco operators, using generative AI, have already cut costs in their customer service operations. The benefits are now shown to be multi-dimensional, addressing some of the industry’s most pressing challenges:
- Latency and resilience: Local agents on AWS Outposts make sure of the near-real-time optimization of network performance. AWS Outposts is ideally suited for workloads that demand low-latency access to on-premises systems, need local data processing, have data residency mandates, or involve the migration of applications with local system interdependencies. A significant advantage of AWS Outposts is that AWS handles all aspects of its management, from installation to monitoring, patching, and updating the hardware, thus making sure of high availability and reliability for the operator.
- Data sovereignty and compliance: Sensitive data is processed locally, reducing regulatory risks. AWS Outposts enables telcos to embrace a “cloud-native everywhere” strategy. This means extending the benefits of cloud agility to the most sensitive data that must reside on-premises. The consistency offered by AWS Outposts means that telcos can develop, deploy, and manage their AI agents using the same cloud-native tools, processes (for example continuous integration/continuous delivery (CI/CD)), and skillsets, whether those AI Agents run in an AWS Region or on-premises.
- Customization and expertise: Fine-tuned models increase accuracy in anomaly detection, RCA, and resolution. Although LLMs demonstrate impressive out-of-the-box performance across diverse tasks, their general knowledge often falls short when applied to highly specialized domains such as telecommunications. Fine-tuning an LLM provides significant benefits for tasks needing specific domain knowledge, adherence to a particular style or format, or when higher accuracy is crucial.
- Cost savings and efficiency: McKinsey studies on AIOps and IT automation suggest the automation of 60–70% of incidents is achievable when AI scales across IT and network domains. Based on the current costs model of telco operators, this can reduce the operational costs between 6% to 9%. For an example, BT reported that it rolled out AIOps, which fixed around 23% of incidents across its digital estate, in FY24 it reported £35m annual savings from AI-powered automation in Openreach.
- CapEx reduction: Based on publicly available telco AI and automation case studies, a reasonable estimate for CapEx reduction from eliminating legacy software licenses, maintenance, and support costs when adopting an agentic AI approach is in the range of 15%–25% of the total software-related CapEx. License elimination: 10–15% reduction. Maintenance and support: 5–10% reduction.
Next steps: building the path toward autonomous Telco operations
The telco industry stands at an inflection point. Traffic, complexity, and customer expectations are accelerating faster than revenue and margins. Traditional operating models—whether through incremental automation or siloed AI deployments—cannot close this gap. A new paradigm is needed: one where distributed, fine-tuned agents act locally, while orchestration agents unify intelligence across the network. This is the promise of Agentic AI, and it is now achievable with AWS.
AWS provides the building blocks to begin this journey today, as shown in the following figure.
Figure 2 AWS stack for agents development
- Amazon Bedrock gives secure access to foundation models (FMs), without the need to manage infrastructure.
- Amazon SageMaker AI enables telcos to fine-tune and retrain domain-specific models using proprietary network data.
- AWS Outposts bring compute and intelligence directly to the edge, in the Telco Provider Data Centers.
- Agentic AI frameworks and patterns from AWS prescriptive guidance provide best practices for designing, deploying, and orchestrating multi-agent systems at scale.
For telco providers, the next step is not a wholesale transformation overnight, but a focused path of experimentation. Identify high-value operational domains—such as incident management, predictive maintenance, or customer assurance—and pilot distributed agent deployments with measurable KPIs. From there, scale horizontally across the network, layering orchestration for true end-to-end automation. The operators who embrace this shift will reshape their networks into adaptive, autonomous platforms that can support the next generation of digital services, moving towards the future of telco operations.