AWS Partner Network (APN) Blog

How PwC uses AI as the Architect Building a Next-Gen Managed Cloud Service Platform in AWS

By Daryl Li, Partner – PwC HK
By David Lau, Senior Partner Solution Architect – AWS HK

PwC

In today’s dynamic tech landscape, organizations grapple with numerous challenges in cloud environment management, from complex compliance requirements to data breach risks. To address these issues, a next-gen managed cloud service platform is essential, incorporating CloudOps quality gates, advanced FinOps capabilities, and Cloud Security Posture Management (CSPM) for enhanced site reliability.

This blog explores PwC’s approach to leveraging Artificial Intelligence (AI) in architecting such a platform on Amazon Web Services (AWS). We’ll delve into AI use cases within PwC consultation services that tackle common cloud management challenges, including cross-cutting threat detection, anomaly identification, compliance & CSPM, cost optimization, and operational efficiency. Readers will gain valuable insights into constructing an AI-driven cloud service platform on AWS, addressing critical aspects of modern cloud operations.

The Need for a Next-Gen Cloud Service Platform

As technological environments evolve, companies face challenges to maintain compliance with complex security and regulatory standards. Misconfigurations and improper handling of cloud environments pose significant risks that could lead to data breaches and unauthorized access. A next-gen cloud service platform has become essential to address these cloud management complexities.

Enterprises also face difficulties in managing billing administrative overhead in multi-cloud environments, adding to the complexity of their operations. Furthermore, a lack of visibility into resource inventory hampers accurate capacity forecasting and planning. Many organizations struggle to bridge the skills gap in their IT teams and keep pace with rapidly evolving cloud services. Additionally, insufficient vendor support for cloud operations and application troubleshooting often leaves customers facing significant challenges.

Cloud services banner highlighting security, integration, optimization, reliability, and talent development as key features for enterprise IT solutions.

Figure 1 – Key features of next-gen cloud service platform

To address these multifaceted issues, a next-gen cloud service platform should incorporate several key features (See figure 1). A CloudOps quality gate is essential to prevent risky configurations that could breach security or compliance standards. Improved management of cloud costs across multiple providers and business units through advanced billing, FinOps, and cost optimization capabilities is crucial. The platform should also offer cloud capability extension, providing access to professional talents to supplement in-house skills. Well-established operation procedures are necessary to prevent data leakages and security breaches. Lastly, the platform should leverage AIOps and advanced monitoring capabilities to improve site reliability. By integrating these features, a next-gen cloud service platform can effectively address the complex challenges faced by organizations in managing their cloud environments, enabling more efficient, secure, and cost-effective cloud operations.

Leveraging AI in Platform Architecture

Leveraging AI in platform architecture has revolutionized the development and deployment of next-gen cloud service platforms. PwC’s cloud environment construction practice underscores a remarkable capability of AI to generate 80% of the Terraform code for this cutting-edge platform, showcasing its power in automating and streamlining the development process. This AI-driven approach enables faster iteration and deployment of cloud infrastructure, significantly reducing manual effort and potential human errors.

At its core, the platform adopts a quality-first approach, utilizing AI to ensure adherence to security and compliance standards, implement 24/7 infrastructure monitoring with well-aligned operation procedures, and maintain a CloudOps quality gate to prevent risky configurations. AI also plays a crucial role in continuous utilization review and automated capacity forecasting, ensuring optimal resource allocation.

Cost control is another area where AI demonstrates its value. The platform employs AI-driven automation to implement strict cost control measures through Service Control Policies (SCP) that limit resource usage, restrict unwanted actions, IAM policies for granular resource control, and region-specific service restrictions. AI algorithms also help in controlling Amazon Elastic Compute Cloud (EC2) instance types and Amazon Elastic Block Storage (EBS) volume creation, optimizing cost efficiency without compromising performance.

In identity and privileged access management, AI enhances security by enabling managed identity governance with automated access provisioning and certification. It facilitates privileged access management with multi-factor authentication (MFA) enforcement and just-in-time access, while also generating comprehensive identity risk and compliance reports. These AI-powered features significantly reduce the risk of unauthorized access and potential security breaches.

Maintaining cloud security hygiene is paramount, and the AI-architected platform incorporates a comprehensive CSPM system. This system leverages AI to identify and prioritize security issues, implement policies based on industry benchmarks and best practices, and continuously enhance CSPM policies with custom rules. The AI’s ability to analyze vast amounts of security data in real-time ensures that the platform remains resilient against evolving cyber threats.

Leveraging AWS Services for AI Integration

In this section, we are going to share two examples on how PwC employed AI into architecting this NextGen cloud service platform for its clients. The scenarios are Enhance Cross-cutting Threats and Anomaly Detection and Cloud Security Posture Management Enhancement.

Enhanced Cross-cutting Threats and Anomaly Detection

Following AWS Well-Architected Operation Excellence pillar, PwC’s best practice is to implement continuous monitoring on all aspects of operating applications on cloud. Figure 2 below is an example of a simplified container-based application running in Amazon Elastic Kubernetes Service (EKS) and how various AWS management, security, and compliance services work together to continuously monitor the health and security of the operations.

The table below lists the type of data written into Amazon Simple Storage Service (S3):

AWS Services Data Written into S3
AWS CloudTrail CloudTrail logs of API invocations to AWS Services
AWS Config AWS resources configuration items, snapshots, and history files
VPC Flow logs Inter- and intra-VPC network traffic flow records
Amazon GuardDuty Security issues/threats findings, malware scan results and S3 protection findings
AWS Security Hub AWS Security Hub is a cloud security posture management service that automates best practice checks, aggregates alerts and vulnerability assessment results from security services like Amazon Inspector and Amazon GuardDuty, and supports automated remediation.

Figure 2 – Sample AWS EKS Architecture with various supporting AWS services

During normal operations, PwC can automate the ingestion of current security events in real-time to a custom threat and anomaly detection AI model (details see next paragraph), which includes unusual patterns or behaviors across network traffic, resource configuration and user activities. In addition, these input can be fed to Amazon Bedrock to finetune a custom Large Language Model (LLM) for the client to provide actionable insights in natural language and when integrated with Amazon Bedrock Agents, it can further trigger automated remediation workflows.

By leveraging the data from VPC Flow Logs, AWS Config data, AWS CloudTrail logs, and historical findings from AWS Security Hub and Amazon GuardDuty, we use Amazon SageMaker to fine-tune a threat and anomaly detection AI model on the combined security datasets specifically for this customer. Subsequently, this custom model will then be used to analyse real-time events for anomaly detection.

Cloud Security Posture Management Enhancement

PwC offers consultation services to clients to enhance their CSPM. This is performed using Cloud Native Application Protection Platform (CNAPP) and loading batches of CSPM rules prioritized from high to low risk on CSPM compliance scanning.

Figure 3 below shows how PwC leverages a set of benchmarks and standards (i.e. CIS 2.0. PwC DarkLab Cloud Best Practices and AWS Well-Architected Framework) to determine the CSPM compliance and these CSPM rules are separated into 3 batches where:

  • Batch 1 consists of baseline security using CIS 2.0 benchmarks plus all high-risk policies from PwC DarkLab Cloud best practices and AWS Well-Architected Framework
  • Batch 2 consists of intermediate standard using all medium and partial low-risk policies – aim is to enable clients to detect large percentage of non-compliant risks
  • Batch 3 adopts to the full set of CSPM policies developed according to best practices for comprehensive detection capabilities among security and operations optimization

Three-step CSPM implementation process showing Baseline Security, Intermediate Standards, and Full Best Practices, each with key focus areas for cloud security posture management.

Figure 3 – CSPM scanning processing

To implement the CSPM scan for this customer, we still need to deal with 193 policies in total where 48 are high risk, 49 are medium risk and 96 are low risk. Given each rule takes on average 4 hours of work for develop, deploy and execute, it translates to nearly 1 headcount month if we are talking about just the 48 high risks.

When PwC partnered with AWS for these clients, PwC consulting team collected the historical incident data and leveraged SageMaker to come up with a deep-learning model that is customized to an airline client. Using the afore-mentioned model, PwC has adapted to this client’s situation and prioritize the 50,000+ CSPM violations to fit into the top 10 riskiest rules and apply mitigations within the first week instead of waiting for 4 weeks for the 48 rules in Batch 1. This pairing of AI with PwC consultants has streamlined customer’s protection and allowed PwC to focus on deploying human resource to where it matters.

Conclusion

The evolving cloud landscape demands innovative solutions to maximize investments and minimize risks. PwC’s cloud service methodologies, combined with AWS services, offer organizations a powerful approach to navigate complex cloud environments. This collaboration enables businesses to:

  • Enhance agility and adaptability
  • Ensure robust compliance measures
  • Maintain a competitive edge in the digital era

By leveraging this AI-augmented next-gen cloud service platform, companies can confidently optimize their cloud strategies, driving sustainable growth and innovation in an increasingly cloud-centric business world.
.
.


PwC – AWS Partner Spotlight

PwC is an AWS Premier Tier Services Partner that helps you drive innovation throughout IT and the business to compete in today’s service economy

Contact PwC | Partner Overview | AWS Marketplace