AWS Partner Network (APN) Blog

IBM Consulting Platform Services on AWS Supports Automated and Intelligent Cloud Operations

By Krish Subramanian, Product Leader, Platform Services – IBM Consulting
By Dhana Vadivelan, Sr. Manager, Solutions Architecture Leader, GSI – AWS

IBM-AWS-Partners-5
IBM
Connect with IBM-2

Imagine that a one-second delay in application response leads to 11% less customer traffic, a 16% decline in customer satisfaction, and 7% fewer customer conversions. How does that impact your business?

Applications are the backbone of any digital business. To keep them running without disruption, poor performance, and outages is a Herculean task. As businesses drive cloud adoption, they are challenged with increased complexity, tech sprawl, data and metric overload, and alert fatigue.

In cloud-native environments, companies are often challenged with the complexity of tools, processes, technologies, and high volumes of logs, alerts, and other data. It requires extensive skills and expertise to maintain application and business outcomes that aren’t easily available in the market.

These issues delay businesses from realizing the value of their IT investments and generate suboptimal application performance and resiliency.

IBM Consulting Platform Services on AWS enables intelligent application management in the Amazon Web Services (AWS) environment. It empowers IT leaders, developers, and site reliability engineers with artificial intelligence (AI)-powered management of complex hybrid and cloud-native enterprise applications.

By simplifying and overseeing the IT landscape, the solution increases application reliability and automates issue resolution fast while optimizing costs that drive client and brand satisfaction.

In this post, we will talk about how IBM Consulting Platform Services on AWS is built using AWS-native services, helping enterprise customers embrace the new cloud operating model by streamlining Day 2 operations.

We’ll also help IT and application operations leaders understand how IBM Consulting Platform Services can streamline operations while improving application resiliency and operational efficiency. IBM is an AWS Premier Tier Services Partner and Managed Service Provider (MSP).

Streamlining Day 2 with Observability and AI Automation

The new cloud operating model requires applications to be designed and instrumented for observability so data can be leveraged to gain insights to proactively discover and remediate issues and potential problems. The new model also allows enterprises to leverage large-scale automation to streamline operations, reduce human errors, and bring in agility to cloud operations.

While this may seem daunting to enterprises using traditional operating models, with IBM Consulting’s expertise in application management, this offering—built on artificial intelligence (AI), machine learning (ML), automation, and open-source technologies—helps deliver next-generation IT operations that enhances overall business availability and performance.

Day 2 cloud operations is critical for managing the application and infrastructure beyond the initial setup and configuration of cloud resources. Hence, considering Day 2 operational design aspects as part of the initial setup is vital for ensuring mission-critical applications are available to meet business demands.

Typical Day 2 responsibilities include the following:

  • Ensuring the reliability and resiliency of the underlying hybrid cloud infrastructure.
  • Supporting application availability through service-level agreements (SLAs).
  • Maintaining the security posture of the underlying infrastructure.
  • Optimizing cloud consumption costs and avoiding waste.
  • Being compliant with governance policies and regulatory requirements.

The traditional approaches to Day 2 operations, however, are not built to consider the distributed nature of the application and underlying infrastructure spanning various AWS Availability Zones (AZs) and regions.

In the case of hybrid cloud, the application and infrastructure footprint spans across multiple data centers. Due to lack of programmability in the traditional infrastructure, the use of automation was limited.

However, with cloud the observability data provides insights into the applications and the distributed infrastructure. This can be leveraged to keep the infrastructure resilient and applications available as defined by the SLA. Programmability allows enterprises to do large-scale automation to proactively remediate potential issues caused by any anomalies.

Human operators and traditional rules-based event automation are of limited help in managing the cloud infrastructure, as they only help troubleshoot or predict problems in known failure domains. With the vast amount of observability data that can be leveraged, and other data stored in siloed systems like help desk, AI/ML models can help proactively predict the potential problems in the known failure domains and identify any unknown failures.

Any unknown component failure and how it impacts at the system level cannot be detected by the traditional human-centric and rules-based approaches of monitoring and logging. You need a more holistic approach to observability by leveraging AI/ML models to analyze the large volumes of observability data consists of logs, metrics, and traces, and eventually use automation techniques to remediate the issues and failures.

Introducing IBM Consulting Platform Services on AWS

IBM Consulting, in collaboration with AWS, is launching the next-gen application management and support (AMS) capabilities called IBM Consulting Platform Services on AWS built using AWS observability and AWS cloud management services.

The next-gen AMS includes both layers of application development and application operations support use cases. The offering benefits customers to achieve the following:

  • Prevent business impact due to application outages.
  • Improve efficiency of IT and application operations.
  • Optimize infrastructure usage.
  • Accelerate migration and modernization.
  • Increase customer satisfaction through proactive support.

The new platform leverages AI/ML assets built by IBM Consulting and AWS cloud-native services like Amazon CloudWatch, Amazon DevOps Guru, AWS X-Ray, Amazon Lex, and AWS Systems Manager to help enterprise customers to implement the following use cases:

  • Observability and anomaly detection: Helps reduce toil, prevent outages, and improve application reliability.
  • RCA analysis: Captures root causes and creates insights for the incidents.
  • Predicting SLA breaches: Performs predictive analytics on the SLA adherence of the incidents while predicting the issue category using text analytics and the issue category of an incident.
  • Incident automation: Helps automate incident and patch management to reduce overall resolution time while improving IT operations efficiency and productivity and application reliability and security.
  • Event integration: Helps integrate events in a centralized vs. non-centralized way.
  • Knowledge management and incident lifecycle management: Identifies similar incidents that have been resolved earlier, predicting the owner group, best resolver group for assigning a ticket, and the best assignee for a particular incident. This results in increasing IT operations efficiency and productivity, faster time to resolve, and improves customer satisfaction

Solutions Architecture

The diagram below describes how IBM Consulting Platform Services on AWS leverages AWS observability and AWS cloud management services.

IBM-Consulting-Platform-Services-1

Figure 1 – Architecture view of IBM Consulting Platform Services on AWS.

  1. Infrastructure and application lifecycle events using AWS-native DevOps toolchain are integrated and hosted on Amazon Elastic Kubernetes Service (Amazon EKS) using Amazon EventBridge.
  2. The tracking of cloud resources’ configurations uses AWS Config rules.
  3. The probable root case analysis is carried based on the observability data generated using Amazon DevOpsGuru and ML models deployed in the EKS cluster.
  4. Infrastructure and application health events are integrated with EventBridge for event analysis and incident management.
  5. Application alerts breaching thresholds are integrated with EventBridge from CloudWatch alarms.
  6. Amazon EventBridge is leveraged to gather state changes from various AWS services.
  7. Application performance data is collected from CloudWatch and AWS X-Ray services. The application API calls are collected via AWS CloudTrail are ingested to derive meaningful insights.
  8. IBM uses AWS Certificate Manager for managing certificates required for securing the data in transit.
  9. The platform also uses AWS Key Management Service (AWS KMS) for managing the encryption keys required for securing the data at rest.
  10. All user interfaces, frontend, data pipeline, and various ML models are deployed in Kubernetes based containers that are managed by EKS.
  11. Amazon Elastic File System (Amazon EFS) is used as persistent storage for EKS cluster nodes.
  12. The platform is integrated with an external identity provider (IdP) for handling federation and single sign-on (SSO).
  13. AWS Systems Manager patch manager is used to automate the patch management of both infrastructure and applications.
  14. AWS Systems Manager is also used automate applications running on Amazon Elastic Compute Cloud (Amazon EC2), AWS serverless, or containers using AWS Systems Manager runbook automation.
  15. AWS Transit Gateway is used as a networking hub for IBM to communicate with customers’ application virtual private clouds (VPCs).
  16. IBM uses AWS AppConfig and Amazon CloudWatch Evidently to capture the application A/B testing and performance insights.
  17. IBM uses Amazon CodeGuru to understand code-level performance issues.
  18. IBM uses AWS CloudFormation for deploying the entire platform architecture, and integration is performed using EventBridge notifications.
  19. IBM ChatOps engine integrates with Amazon Lex to provide natural language understanding (NLU)-based conversation for ops engineers.
  20. IBM ChatOps also integrates with one of the third-party collaboration tools like Slack as a support channel for ops engineers during the incident resolution.
  21. IBM uses data from ITSM tools like ServiceNow for performing incident management and incident analysis.
  22. AWS Backup is used to back up and archive IBM configuration and database for any point-in-time recovery or business continuity purpose.
  23. AWS Identity and Access Management (IAM) defines the roles and delegate permissions for building solution and related components.

Throughout 2023, IBM and AWS also plan to offer clients several artificial intelligence for IT operations (AIOps) and observability software options as part of the IBM Consulting Platform Services offering, including IBM Instana Observability, IBM Turbonomic Application Resource Management, and IBM Cloud Pak for Watson AIOps.

Conclusion

In this post, we outlined how IBM Consulting Platform Services on AWS helps customers prioritize business-critical applications’ outage alerts, improve application development and operations efficiency by 35% mean time to resolve tickets (MTTR), and reduce root cause analysis cycle by 20-40%.

IBM Consulting Platform Services also helps customers reduce total cost of ownership (TCO) by 15-40% depending on the maturity of the enterprise. By bringing in AI-driven smart operations, enterprises can streamline their Day 2 operations to maximize their cloud investments.

Visit AWS Cloud Operations to learn more about cloud management and governance tools that encourage faster innovation and maintain control over cost, compliance, and security. You can also engage with IBM Consulting for a proof of concept (PoC) or demo.

.
IBM-APN-Blog-Connect-2022
.


IBM – AWS Partner Spotlight

IBM is an AWS Premier Tier Services Partner and MSP that offers comprehensive service capabilities addressing both business and technology challenges that clients face today.

Contact IBM | Partner Overview